End-To-End Interpretable Motion Planner for Autonomous Vehicles

ABSTRACT

Systems and methods for generating motion plans including target trajectories for autonomous vehicles are provided. An autonomous vehicle may include or access a machine-learned motion planning model including a backbone network configured to generate a cost volume including data indicative of a cost associated with future locations of the autonomous vehicle. The cost volume can be generated from raw sensor data as part of motion planning for the autonomous vehicle. The backbone network can generate intermediate representations associated with object detections and objection predictions. The motion planning model can include a trajectory generator configured to evaluate one or more potential trajectories for the autonomous vehicle and to select a target trajectory based at least in part on the cost volume generate by the backbone network.

RELATED APPLICATION

This application claims priority to and the benefit of U.S. ProvisionalPatent Application No. 62/768,847, titled “End-to-End InterpretableNeural Motion Planner,” and filed on Nov. 16, 2018. U.S. ProvisionalPatent Application No. 62/768,847 is hereby incorporated by referenceherein in its entirety.

FIELD

The present disclosure relates generally to improving the ability ofcomputing devices to plan motion paths for autonomous vehicles.

BACKGROUND

An autonomous vehicle is a vehicle that is capable of sensing itsenvironment and navigating without human input. In particular, anautonomous vehicle can observe its surrounding environment using avariety of sensors and can attempt to comprehend the environment byperforming various processing techniques on data collected by thesensors. Given knowledge of its surrounding environment, the autonomousvehicle can identify an appropriate motion path for navigating throughsuch surrounding environment.

SUMMARY

Aspects and advantages of embodiments of the present disclosure will beset forth in part in the following description, or may be learned fromthe description, or may be learned through practice of the embodiments.

One example aspect of the present disclosure is directed to anautonomous vehicle that includes one or more processors and one or morenon-transitory computer-readable media that collectively store amachine-learned motion planning model that is configured to receivesensor data and map data associated with an environment external to anautonomous vehicle and process the sensor data and the map data togenerate a target trajectory for the autonomous vehicle. Themachine-learned motion planning model includes a backbone networkconfigured to receive the sensor data and the map data and to generate acost volume including data indicative of a cost associated with each ofa plurality of future locations of the autonomous vehicle within aplanning horizon. The backbone network is configured to generate one ormore intermediate representations associated with at least one of anobject detection or an objection prediction by the backbone network. Themachine-learned motion planning model includes a trajectory generatorconfigured to select a target trajectory for the autonomous vehiclebased at least in part on the cost volume generated by the backbonenetwork. The one or more non-transitory computer-readable mediacollectively store instructions that, when executed by the one or moreprocessors, cause the one or more processors to perform operations. Theoperations include obtaining the sensor data and the map data, inputtingthe sensor data and the map data into the machine-learned motionplanning model, and receiving the target trajectory as an output of themachine-learned motion planning model.

Another example aspect of the present disclosure is directed to one ormore non-transitory computer-readable media that collectively store amachine-learned motion planning model. The machine-learned motionplanning model includes a backbone network configured to receive sensordata and map data associated with an environment external to anautonomous vehicle and to generate a cost volume including dataindicative of a cost associated with each of a plurality of futurelocations of the autonomous vehicle within a planning horizon. Thebackbone network is configured to generate one or more intermediaterepresentations associated with at least one of an object detection oran objection prediction by the backbone network. The machine-learnedmotion planning model includes a trajectory generator configured toselect a target trajectory for the autonomous vehicle based at least inpart on the cost volume generated by the backbone network.

Yet another example aspect of the present disclosure is directed to acomputer-implemented method of motion planning for an autonomousvehicle. The method includes obtaining, by a computing system comprisingone or more computing devices, sensor data and map data associated withan environment external to the autonomous vehicle, generating, by thecomputing system using a backbone network of a machine-learned motionplanning model, a cost volume indicative of a cost associated with eachof a plurality of future locations of the autonomous vehicle,generating, by the computing system using the backbone network, one ormore intermediate representations associated with one or more objectsdetected by the backbone network, and generating, by the computingsystem using a trajectory generator of the machine-learned motionplanning model, a target trajectory for the autonomous vehicle based atleast in part on the cost volume.

Other example aspects of the present disclosure are directed to systems,methods, vehicles, apparatuses, tangible, non-transitorycomputer-readable media, and memory devices for motion planning forautonomous vehicles.

These and other features, aspects and advantages of various embodimentswill become better understood with reference to the followingdescription and appended claims. The accompanying drawings, which areincorporated in and constitute a part of this specification, illustrateembodiments of the present disclosure and, together with thedescription, serve to explain the related principles.

BRIEF DESCRIPTION OF THE DRAWINGS

Detailed discussion of embodiments directed to one of ordinary skill inthe art are set forth in the specification, which makes reference to theappended figures, in which:

FIG. 1 depicts an example system overview including an autonomousvehicle according to example embodiments of the present disclosure;

FIG. 2 depicts an example computing environment including a motionplanning system of a vehicle computing system for an autonomous vehicleaccording to example embodiments of the present disclosure;

FIG. 3 depicts an example computing environment including amachine-learned motion planning model of a motion planning system for anautonomous vehicle according to example embodiments of the presentdisclosure;

FIG. 4 depicts an example computing environment including a multi-headedconvolutional neural network for a machine-learned motion planning modelaccording to example embodiments of the present disclosure;

FIG. 5 depicts a flowchart diagram illustrating an example method forgenerating a target trajectory using a machine-learned motion planningmodel according to example embodiments of the present disclosure;

FIG. 6 depicts a flowchart diagram illustrating an example method forobtaining a set of potential trajectories for evaluation using a learnedcost volume according to example embodiments of the present disclosure;

FIG. 7 depicts a graphical representation of generating a potentialtrajectory for an autonomous vehicle according to example embodiments ofthe present disclosure;

FIG. 8 depicts a flowchart diagram illustrating an example method fortraining a machine-learned motion planning model to generate targettrajectories based on raw sensor data, including an optimization ofintermediate representations for motion planning according to exampleembodiments of the present disclosure;

FIG. 9 depicts example system units for performing operations andfunctions according to example embodiments of the present disclosure;and

FIG. 10 depicts example system components according to exampleimplementations of the present disclosure.

DETAILED DESCRIPTION

Reference now will be made in detail to embodiments, one or moreexample(s) of which are illustrated in the drawings. Each example isprovided by way of explanation of the embodiments, not limitation of thepresent disclosure. In fact, it will be apparent to those skilled in theart that various modifications and variations can be made to theembodiments without departing from the scope or spirit of the presentdisclosure. For instance, features illustrated or described as part ofone embodiment can be used with another embodiment to yield a stillfurther embodiment. Thus, it is intended that aspects of the presentdisclosure cover such modifications and variations.

Generally, the present disclosure is directed to improved systems andmethods for motion planning in autonomous vehicles through theutilization of one or more machine-learned motion planning models. Moreparticularly, a motion planning system for an autonomous vehicle isprovided that includes at least one end-to-end learnable andinterpretable motion planning model. For example, a machine-learnedmotion planning model in accordance with example embodiments may beconfigured to receive sensor data such as raw image data,light-detection and ranging (LIDAR) data, RADAR data, map data, etc.,and to directly generate a motion plan for an autonomous vehicle basedon the raw sensor data. The motion planning model in some examples mayinclude a single neural network that takes raw sensor data and dynamicmap data as an input, and predicts a cost map for motion planning as anoutput. The motion planning model may be configured to generate a targettrajectory of a motion plan for an autonomous vehicle, as well as one ormore intermediate representations of the environment external to theautonomous vehicle based on the sensor data. In this manner, the motionplanning model may be trained end-to-end to provide motion planning foran autonomous vehicle based on raw sensor data, while also providingintermediate representations such as object detections and objectpredictions as an interpretable output. Accordingly, the motion planningmodel provides an end-to-end driving approach that is optimized for themotion planning task, while also providing intermediate representationsthat can be accessed to improve the effectiveness of the model, such asby training to optimize the intermediate representations for motionplanning.

An autonomous vehicle (e.g., ground-based vehicle, aircraft, etc.) caninclude various systems and devices configured to control the operationof the vehicle. For example, an autonomous vehicle can include anonboard vehicle computing system (e.g., located on or within theautonomous vehicle) that is configured to operate the autonomousvehicle. The vehicle computing system can obtain sensor data fromsensor(s) onboard the vehicle (e.g., cameras, LIDAR, RADAR, GPS, etc.),access map data associated with an environment external to theautonomous vehicle, and generate an appropriate motion plan through thevehicle's surrounding environment based on the sensor data and map data.To more accurately and efficiently generate a motion plan through theautonomous vehicle's surrounding environment, a machine-learned motionplanning model that is trained end-to-end to generate motion plans basedon input sensor data is provided according to example embodiments of thepresent disclosure.

The machine-learned motion planning model may be configured to generatea cost volume defining a cost of a plurality of positions or locationsthat an autonomous vehicle may take within a planning horizon. The costvolume may represent a measure of the desirability of each location orposition that the autonomous vehicle may take. For example, the cost orother measure for a particular position may be indicative of alikelihood of safety or other parameter associated with the vehicletaking that position. A lower cost may be indicative of a lowerlikelihood of collision with another object at a particular location,for example, whereas a higher cost may be indicative of a higherlikelihood of collision with another object at the particular location.Additionally, the machine-learned motion planning model may beconfigured to generate one or more interpretable immediaterepresentations based on the sensor data, such as three-dimensionalobject detections and/or motion predictions for detected objects. Themachine-learned motion planning model can select a target trajectorybased on the cost volume for the future AV locations. In some examples,the model can sample a set of diverse physically possible trajectoriesfor the autonomous vehicle and compute a score for a set of potentialtrajectories based on the cost volume. The model can select a targettrajectory for the autonomous vehicle based on the trajectory scores forthe set of potential trajectories. For example, the model can select thepotential trajectory having the lowest trajectory score (also referredto as minimum cost) as the target trajectory for the autonomous vehicle.In other examples, the model can select the target trajectory byoptimizing a potential trajectory, such as a randomly sampledtrajectory, based on the cost volume generated by the backbone network.The machine-learned motion planning model is able to utilize anonparametric cost volume that can capture the uncertainty andmultimodality in possible AV trajectories, such as the uncertainty andmultimodality in changing a lane versus staying in a lane. By utilizinga motion planner that includes an end-to-end machine-learned motionplanning model, the motion planning system is capable of handlingcomplex urban scenarios that include traffic light handling, yielding,and interactions with multiple road users.

In accordance with example embodiments of the disclosed technology, anend-to-end interpretable and learnable motion planning model cangenerate accurate three-dimensional trajectories for an autonomousvehicle over a planning horizon (e.g., a few seconds). The model cantake as input LIDAR data such as one or more LIDAR point clouds, as wellas map data such as data for a high-definition map. Other sensor datamay be used in addition to or in place of LIDAR data, such as image dataand/or RADAR data. The motion planning model can generate one or moreinterpretable intermediate representations in the form ofthree-dimensional object detections and/or future motion forecasts overthe planning horizon for the three-dimensional object detections. Insome examples, the interpretable intermediate representations areprovided as a first output of the motion planning model. The motionplanning model can additionally generate a space-time cost volume thatrepresents a cost associated with all locations that the autonomousvehicle can take within the planning horizon. The space-time cost volumecan be generated as a second output of the motion planning model in someexamples. A target trajectory can be selected based on the space-timecost volume. For example, one or more trajectory proposals (alsoreferred to as potential trajectories) can be scored using the learnedcost volume so that the motion planner can select the trajectoryproposal having the minimum cost. The trajectory proposal having theminimum cost can be provided as a third output of the motion planningmodel in some examples.

The model can be jointly trained such that the intermediaterepresentations are optimized for the end task of motion planning. Forinstance, the machine-learned motion planning model can be jointlytrained for both motion planning and generating the intermediaterepresentations based on one or more optimization parameters for motionplanning. In this manner, the motion planning model can provideinterpretability and handle multimodality naturally. Moreover, themotion planning model can handle uncertainty naturally as this may berepresented in the cost volume. In some examples, this approach canavoid costly parameter tuning by enabling the model to learn conceptsthat are difficult to specify by hand such as, “slowing down inapproaching occlusion.” Additionally, the motion planning model can betrained by back-propagating feedback from the motion planning system,such as to optimize objection detection and/or objection prediction.

In accordance with example embodiments, the machine-learned motionplanning model can be trained end-to-end with a multitask objective. Insome examples, the machine-learned motion planning model can be trainedbased at least in part on multitask training with supervision for bothperception and motion planning. The machine-learned motion planningmodel can be trained using a total loss function that includes aperception loss component as well as a motion planning loss component.The planning loss can encourage the minimum cost plan selected by themodel to be similar to a trajectory performed by a human demonstrator insome examples. A motion planning loss component can be generated basedat least in part on one or more human-driven trajectories in someexamples. Such a loss can be a sparse loss in some instances, as aground truth trajectory may only occupy a small portion of the possiblespace. Because learning with a planning loss alone may be slow anddifficult, embodiments of the disclosed technology may also utilize aperception loss that encourages the intermediate representations toproduce accurate three-dimensional detections as well as motionforecasting. This can enable the interpretability of the intermediaterepresentations and may enable faster training of the motion planningmodel.

In some examples, the perception loss component can include aclassification loss and/or a regression loss. The classification losscan be utilized for distinguishing a vehicle from a background, whilethe regression loss can be used for generating precise object boundingboxes. For a predefined anchor box, the motion planning model can outputa classification score as well as several regression targets. Theclassification score can indicate the probability of existence of avehicle at a particular anchor box. A cross entropy loss can be used insome examples for the classification. The regression loss outputs caninclude information indicative of position, shape, and the heading angleat each of a plurality of time frames. The regression loss can be summedover all vehicle correlated anchor boxes, from the current time framethrough a prediction horizon. In this manner, the regression loss can beused to train the model to predict the position of vehicles at everytimeframe. Each anchor box can be associated to its neighboring groundtruth bounding box in order to find a training label for each anchorbox. Any non-assigned ground truth boxes can also be associated withtheir nearest neighbor.

The planning loss component can be trained utilizing a minimization of amax-margin loss where the ground truth trajectory is used as a positivetraining example, and randomly sampled trajectories are utilized asnegative training examples. Such an approach can overcome the difficultyassociated with learning a reasonable cost volume in situations where aground truth cost volume is not available. The motion planning model canbe trained to encourage the ground truth trajectory to have the minimumcost, and other trajectories to have higher costs. The discrepancybetween the ground truth trajectory and a negative trajectory sample canbe utilized during training. The distance between a negative trajectoryand a ground truth trajectory can be used to encourage negativetrajectories far from the ground truth trajectory to have a much highercost. A discrepancy can be computed between the ground truth trajectoryand each negative sample, followed by optimizing the worst case by themax operation. Such an approach may encourage the motion planning modelto learn a cost volume that discriminates “good” trajectories from “bad”trajectories.

A machine-learned motion planning model in accordance with exampleembodiments may include a backbone network as well as a trajectorygenerator. The backbone network can be configured to take as input LIDARdata and map data in some examples, and provide as output one or moreintermediate representations such as bounding boxes of other objects forfuture timesteps, as well as the cost volume for planning with apredetermined number of filters corresponding to the timesteps. In otherexamples, the backbone network may use image data and/or RADAR data,alone or in combination with LIDAR data. The trajectory generator canselect a target trajectory for the autonomous vehicle based on the costvolume. For example, the trajectory generator can obtain a set ofpotential trajectories of the autonomous vehicle and generate atrajectory score for each potential trajectory using the cost volumefrom the backbone network. The trajectory generator can index the costof each potential trajectory from different filters of the cost volumeand sum them together to generate a trajectory score in some examples.The trajectory generator can select the trajectory with the minimum costfor final motion planning in some examples. In another example, thetrajectory generator can optimize a single sampled trajectory using thecost volume. For example, the trajectory generator can include anoptimizer that optimizes a sampled trajectory by minimizing the costcomputed for the trajectory using the cost volume.

In accordance with some embodiments, the motion planning model can beconfigured to formulate a motion planning task as a deep structuredminimization problem. The minimization can be approximated by sampling aset of physically valid trajectories, and picking the trajectory havingthe minimum cost using a cost volume. The cost volume can be a learnedcost volume generated by a convolutional neural network backbone. Theconvolutional neural network can extract features from both the LIDARdata and the map data to generate a feature map.

A feature map may include one or more sensor features (e.g., LIDARfeatures) and one or more map features in some examples. In someexamples, the LIDAR data can be rasterized into a three-dimensionaloccupancy grid, where each voxel has a binary value indicating whetherit contains a LIDAR point. This can result in a three-dimensional tensorrepresenting the height and x-y spatial dimensions of each LIDAR point,respectively. The map data can be utilized to provide accurate motionplanning by enabling the autonomous vehicle to follow traffic rules andother external constraints, for example. In some examples,high-definition maps can contain information about the semantics of ascene, such as the location of a lane, the lane boundary shape (e.g.,solid, dashed, etc.), and the location of signs (e.g, stop signs, etc.).The map data can be rasterized to form an M-channel tensor, where eachchannel represents a different map element, including road,intersections, lanes, lane boundaries, traffic lights, etc. The backbonenetwork can include a plurality of blocks, were each block includes oneor more convolutional layers. A multiscale feature map can be generatedafter a first portion of the blocks and then fed into a final block. Insome examples, a feature map can include a three-dimensional LIDARtensor as well as an M-channel map tensor.

The motion planning model can feed the feature map into two branches ofconvolutional layers that output the intermediate representations (e.g.,three-dimensional detections and motion forecasting) and the costvolume, respectively. A first branch can be implemented as a perceptionheader or model head including one or more classification layers and oneor more regression layers. A second branch can be implemented as a costvolume header or model head including one or more deconvolution layersand one or more convolution layers. The perception header can beconfigured to generate one or more bounding boxes and/or motionforecasts corresponding to object detections based on the sensor dataand/or map data. The cost volume header can be configured to generate athree-dimensional cost volume based on the sensor data and/or map data.

The perception header may include a classification component as well asa regression component according to some implementations. The twocomponents may be formed of convolution layers in some examples.Multiple predefined anchor boxes (also referred to as anchors) can bedefined at each feature map location. The different anchors at eachlocation may include different sizes, aspect ratios, and orientation.The classification branch can output a score for each anchor indicatingthe probability of a vehicle or other object at each anchor's location.The regression branch can output regression targets for each anchor atdifferent timesteps. This can include a localization offset size and aheading angle. Regression can be performed at each timestep to producemotion forecasting for each vehicle or other object. The cost volumeheader can include several convolution and deconvolution layers. Aconvolution layer may be utilized that includes a filter number over theplanning horizon. Each filter can generate a cost volume for a futuretimestep. This enables the cost of any trajectory to be evaluated bysimply indexing into the cost volume.

The trajectory generator can apply sampling or optimization to obtain alow-cost trajectory by sampling a wide variety of diverse trajectoriesthat can be executed by the autonomous vehicle or optimizing one or moresampled trajectories. The trajectory generator can produce as output atarget trajectory with a minimal cost according to the learned costvolume. In some examples, the trajectory generator can efficientlysample trajectories that are physically possible and evaluate the costvolume efficiently. A trajectory can be defined by the combination of aspatial path (e.g., a curve in the two-dimensional plane) and thevelocity profile (e.g., how fast the autonomous vehicle goes along thispath). To consider real-world constraints, the motion planning model canimpose that the vehicle should follow a dynamical model. The motionplanning model can generate a set of potential trajectories according toat least one of a speed constraint, an acceleration constraint, or aturning angle constraint defined by the dynamical model. Additionally, aplanar curve such as a Clothoid curve, also known as Euler spiral orCornu spiral, can be used to represent the two-dimensional path of theautonomous vehicle. Finally, a longitudinal velocity can be defined thatspecifies the autonomous vehicle motion along the autonomous vehiclepath.

By utilizing planar curves, sampling a path of the autonomous vehiclecan correspond to sampling according to a scaling factor based onvelocity. The shape of the planar curve (e.g., Clothoid curve) can befixed based on the scaling factor that is sampled, and the autonomousvehicle steering angle can be used to find a corresponding position onthe curve. Constant accelerations can also be sampled which specify theautonomous vehicle's velocity profile. By combining the sampled curvesand the velocity profiles, the trajectories can be projected intodiscrete timesteps, and the corresponding waypoint can be obtained forwhich to evaluate the learned cost.

After defining a set of potential trajectories from a plurality ofpossible trajectories that are possible, the trajectory generator canevaluate the set of potential trajectories to select a target trajectoryfor motion planning. The trajectory generator can access the cost volumegenerated by the backbone network to generate a set of timestep costindices for each potential trajectory. For example, the trajectorygenerator can generate a timestep cost index from each filter of thecost volume corresponding to a particular time step. The trajectorygenerator can sum the timestep cost index from the different filters tocalculate a trajectory score for a potential trajectory. The trajectorygenerator can select the potential trajectory having the minimum totalcost index or trajectory score as the target trajectory for final motionplanning by the autonomous vehicle

Accordingly, an autonomous vehicle in accordance with exampleembodiments of the disclosed technology may include a vehicle computingsystem configured to perform motion planning for the autonomous vehicle.The vehicle computing system can include a motion planning system forexample. The motion planning system can include a machine-learned motionplanning model that is configured to receive sensor data and map dataassociated with an environment external to the autonomous vehicle. Themotion planning model can process the sensor data and the map data togenerate a target trajectory for the autonomous vehicle. The motionplanning model can include a backbone network that is configured toreceive the sensor data and the map data and to generate a cost volumethat includes data indicative of a cost associated with each of aplurality of future locations of the autonomous vehicle within aplanning horizon.

The backbone network can be configured to generate one or moreintermediate representations associated with at least one of an objectdetection or an object prediction by the backbone network. In someexamples, an intermediate representation may include a three-dimensionalobject detection and/or a motion prediction or forecast associated withthe object detection. For example, an intermediate representation caninclude a bounding box associated with an object detection and/or one ormore motion predictions associated with an object prediction. Thebackbone network can include a perception header that is configured togenerate the intermediate representations based at least in part on thesensor data and the map data. The backbone network can include a costvolume header that is configured to generate a cost volume based atleast in part on the sensor data and the map data. The perception headercan include a first set of one or more convolutional network layers. Thecost volume header can include a second set of one or more convolutionalnetwork layers. In some examples, the motion planning model can betrained such that the first set of convolutional network layers forobject detection are optimized based at least in part on an output ofthe set of convolutional network layers for generating the cost volume.

In accordance with some embodiments, a motion plan can be generated foran autonomous vehicle by obtaining sensor data and map data associatedwith an environment external to the autonomous vehicle. A vehiclecomputing system can generate a cost volume indicative of a costassociated with each of a plurality of future locations of theautonomous vehicle by utilizing a backbone network of machine-learnedmotion planning model. The vehicle computing system can generate one ormore intermediate representations associated with one or more objectsdetected by the backbone network. The vehicle computing system canobtain a set of potential trajectories for the autonomous vehicle. Atrajectory generator of the vehicle computing system can generate arespective cost for each of the set of potential trajectories based atleast in part on the cost volume associated with the potentialtrajectory. The trajectory generator can select the potential trajectoryhaving the minimum trajectory cost as the target trajectory for theautonomous vehicle.

Embodiments in accordance with the disclosed technology provide a numberof technical effects and benefits, particularly in the areas ofcomputing technology, autonomous vehicles, and the integration ofcomputing technology with autonomous vehicles. In particular, exampleimplementations of the disclosed technology provide improved techniquesfor generating motion plans such as target trajectories for autonomousvehicles. For example, by utilizing one or more implementations of thedisclosed technology, a vehicle computing system can more accurately andefficiently generate motion plans for an autonomous vehicle and therebyenable the autonomous vehicle to drive autonomously in complex scenariosthat may include traffic light handling, yielding, and interactions withmultiple actors such as pedestrians and other vehicles.

A holistic machine-learned motion planning model is provided that cantake as input raw sensor data and produce a cost volume to define a costassociated with each position that the autonomous vehicle can takewithin a planning horizon. In addition, the motion planning model canproduce interpretable intermediate representations in the form ofthree-dimensional detections and future trajectories. Such a modelprovides an end-to-end driving approach that can avoid shortcomingsassociated with traditional engineering stacks that divide the drivingproblem into subtasks including perception, prediction, motion planning,and control. For example, the machine-learned motion planning model canbe trained to optimize the end task of motion planning based on rawsensor data. More particularly, the machine-learned motion planningmodel can be trained such that intermediate representations includingobject detections and motion predictions are optimized for the end taskof motion planning rather than for their particular subtasks. Such anapproach can provide more accurate motion plans based on more optimalintermediate representation generations. Moreover, a nonparametric costvolume as described is able to capture the uncertainty andmulti-modality that is possible with autonomous vehicle trajectories.Such a learned cost volume approach naturally captures multi-modalityand uncertainty that is present in autonomous driving.

Additionally, the utilization of a trajectory generator can enable anaccurate and efficient generation of potential trajectories from whichthe target trajectory for the autonomous vehicle can be selected. Forinstance, a large number of possible trajectories can be possible for anautonomous vehicle, which can lead to a minimization problem whoseoptimization is non-polynomial hard, and thus may result in aninefficient determination of potential trajectories. To overcome suchdrawbacks, sampling can be utilized by the trajectory generator toobtain a low-cost trajectory in a timely and computationally efficientmanner. Trajectories can be efficiently sampled that are physicallypossible so that the cost volume can be evaluated efficiently.Recognizing that an autonomous vehicle cannot execute all possible setof points in Cartesian space, sampling a trajectory and a set of pointsin the available space may not be an optimal approach. Accordingly, thetrajectory generator can be configured to consider real-worldconstraints such as physical limits on speed, acceleration, and turningangle. By employing dynamical models, efficiencies in computingprocessing requirements and time can be achieved.

Compared with traditional machine-learned model approaches, such asimitation learning approaches that directly regress steer angle from rawsensor data, a machine-learned model in accordance with the disclosedtechnology may provide interpretability and handle multi-modalitynaturally. For instance, when compared with traditional approaches whichuse manually designed cost functions built on top of perception andprediction systems, a motion planning model in accordance with thedisclosed technology can provide the advantage of being jointly trained.Thus, learned representations that are more optimal for the end task ofmotion planning can be provided. Additionally, an interpretablemachine-learned motion planning model in accordance with the disclosedtechnology enables feedback to be backpropagated from the motionplanning system as part of optimizing the generation of intermediaterepresentations. Furthermore, the machine-learned motion planning modelcan handle uncertainty naturally as this is represented in the costvolume, and does not require costly parameter tuning such that conceptscan be learned that are difficult to specify by hand. By utilizing amachine-learned motion planning model that handles uncertainty as wellas multimodality, an autonomous vehicle can increase the accuracy andefficiency of motion planning in real time and thereby increase thesafety and reliability of autonomous vehicles.

The autonomous vehicle technology described herein can help improve thesafety of passengers of an autonomous vehicle, improve the safety of thesurroundings of the autonomous vehicle, improve the experience of therider and/or operator of the autonomous vehicle, as well as provideother improvements as described herein. Moreover, the autonomous vehicletechnology of the present disclosure can help improve the ability of anautonomous vehicle to effectively provide vehicle services to others andsupport the various members of the community in which the autonomousvehicle is operating, including persons with reduced mobility and/orpersons that are underserved by other transportation options.Additionally, the autonomous vehicle of the present disclosure mayreduce traffic congestion in communities as well as provide alternateforms of transportation that may provide environmental benefits.

With reference now to the figures, example embodiments of the presentdisclosure will be discussed in further detail.

FIG. 1 illustrates an example vehicle computing system 110 according toexample embodiments of the present disclosure. The vehicle computingsystem 110 can be associated with a vehicle 102. The vehicle computingsystem 110 can be located onboard (e.g., included on and/or within) thevehicle 102.

The vehicle 102 incorporating the vehicle computing system 110 can bevarious types of vehicles. In some implementations, the vehicle 102 canbe an autonomous vehicle. For instance, the vehicle 102 can be aground-based autonomous vehicle such as an autonomous car, autonomoustruck, autonomous bus, etc. The vehicle 102 can be an air-basedautonomous vehicle (e.g., airplane, helicopter, bike, scooter, or otheraircraft) or other types of vehicles (e.g., watercraft, etc.). Thevehicle 102 can drive, navigate, operate, etc. with minimal and/or nointeraction from a human operator 106 (e.g., driver). An operator 106(also referred to as a vehicle operator) can be included in the vehicle102 and/or remote from the vehicle 102. Moreover, in someimplementations, the vehicle 102 can be a non-autonomous vehicle. Theoperator 106 can be associated with the vehicle 102 to take manualcontrol of the vehicle, if necessary. For instance, in a testingscenario, a vehicle 102 can be periodically tested with controlledfaults that can be injected into an autonomous vehicle's autonomy system130. This can help the vehicle's response to certain scenarios. Avehicle operator 106 can be located within the vehicle 102 and/or remotefrom the vehicle 102 to take control of the vehicle 102 (e.g., in theevent the fault results in the vehicle exiting from a fully autonomousmode in the testing environment).

The vehicle 102 can be configured to operate in a plurality of operatingmodes. For example, the vehicle 102 can be configured to operate in afully autonomous (e.g., self-driving) operating mode in which thevehicle 102 is controllable without user input (e.g., can drive andnavigate with no input from a vehicle operator present in the vehicle102 and/or remote from the vehicle 102). The vehicle 102 can operate ina semi-autonomous operating mode in which the vehicle 105 can operatewith some input from a vehicle operator present in the vehicle 102(and/or a human operator that is remote from the vehicle 102). Thevehicle 102 can enter into a manual operating mode in which the vehicle102 is fully controllable by a vehicle operator 106 (e.g., human driver,pilot, etc.) and can be prohibited and/or disabled (e.g., temporary,permanently, etc.) from performing autonomous navigation (e.g.,autonomous driving). In some implementations, the vehicle 102 canimplement vehicle operating assistance technology (e.g., collisionmitigation system, power assist steering, etc.) while in the manualoperating mode to help assist the vehicle operator 106 of the vehicle102. For example, a collision mitigation system can utilize informationconcerning vehicle trajectories within the vehicle's surroundingenvironment to help an operator avoid collisions even when in manualmode.

The operating modes of the vehicle 102 can be stored in a memory onboardthe vehicle 102. For example, the operating modes can be defined by anoperating mode data structure (e.g., rule, list, table, etc.) thatindicates one or more operating parameters for the vehicle 102, while inthe particular operating mode. For example, an operating mode datastructure can indicate that the vehicle 102 is to autonomously plan itsmotion when in the fully autonomous operating mode. The vehiclecomputing system 110 can access the memory when implementing anoperating mode.

The operating mode of the vehicle 102 can be adjusted in a variety ofmanners. For example, the operating mode of the vehicle 102 can beselected remotely, off-board the vehicle 105. For example, a remotecomputing system (e.g., of a vehicle provider and/or service entityassociated with the vehicle 102) can communicate data to the vehicle 102instructing the vehicle 102 to enter into, exit from, maintain, etc. anoperating mode. For example, in some implementations, the remotecomputing system can be an operations computing system 180, as disclosedherein. By way of example, such data communicated to a vehicle 102 bythe operations computing system 180 can instruct the vehicle 102 toenter into the fully autonomous operating mode. In some implementations,the operating mode of the vehicle 102 can be set onboard and/or near thevehicle 102. For example, the vehicle computing system 100 canautomatically determine when and where the vehicle 102 is to enter,change, maintain, etc. a particular operating mode (e.g., without userinput). Additionally, or alternatively, the operating mode of thevehicle 102 can be manually selected via one or more interfaces locatedonboard the vehicle 105 (e.g., key switch, button, etc.) and/orassociated with a computing device proximate to the vehicle 105 (e.g., atablet operated by authorized personnel located near the vehicle 102).In some implementations, the operating mode of the vehicle 102 can beadjusted by manipulating a series of interfaces in a particular order tocause the vehicle 102 to enter into a particular operating mode.

The vehicle computing system 110 can include one or more computingdevices located onboard the vehicle 102. For example, the computingdevice(s) can be located on and/or within the vehicle 102. The computingdevice(s) can include various components for performing variousoperations and functions. For instance, the computing device(s) caninclude one or more processors and one or more tangible, non-transitory,computer readable media (e.g., memory devices, etc.). The one or moretangible, non-transitory, computer readable media can store instructionsthat when executed by the one or more processors cause the vehicle 102(e.g., its computing system, one or more processors, etc.) to performoperations and functions, such as those described herein for identifyingtravel way features.

The vehicle 102 can include a communications system 112 configured toallow the vehicle computing system 110 (and its computing device(s)) tocommunicate with other computing devices. The vehicle computing system110 can use the communications system 112 to communicate with one ormore computing device(s) that are remote from the vehicle 102 over oneor more networks (e.g., via one or more wireless signal connections).For example, the communications system 112 can allow the vehiclecomputing system 110 to communicate with an operations computing system180. By way of example, the operations computing system 180 can includeone or more remote servers communicatively linked to the vehiclecomputing system 110. In some implementations, the communications system112 can allow communication among one or more of the system(s) onboardthe vehicle 102. The communications system 112 can include any suitablecomponents for interfacing with one or more network(s), including, forexample, transmitters, receivers, ports, controllers, antennas, and/orother suitable components that can help facilitate communication.

As shown in FIG. 1 , the vehicle 102 can include one or more vehiclesensor(s) 116, an autonomy computing system 130, one or more vehiclecontrol systems 120, one or more positioning systems 114, and othersystems, as described herein. One or more of these systems can beconfigured to communicate with one another via a communication channel.The communication channel can include one or more data buses (e.g.,controller area network (CAN)), onboard diagnostics connector (e.g.,OBD-II), and/or a combination of wired and/or wireless communicationlinks. The onboard systems can send and/or receive data, messages,signals, etc. amongst one another via the communication channel.

The vehicle sensor(s) 116 can be configured to acquire sensor data 118.This can include sensor data associated with the surrounding environmentof the vehicle 102. For instance, the sensor data 118 can includetwo-dimensional data depicting the surrounding environment of thevehicle 102. In addition, or alternatively, the sensor data 118 caninclude three-dimensional data associated with the surroundingenvironment of the vehicle 102. For example, the sensor(s) 116 can beconfigured to acquire image(s) and/or other two- or three-dimensionaldata within a field of view of one or more of the vehicle sensor(s) 116.The vehicle sensor(s) 116 can include a Light Detection and Ranging(LIDAR) system, a Radio Detection and Ranging (RADAR) system, one ormore cameras (e.g., visible spectrum cameras, infrared cameras, etc.),motion sensors, and/or other types of two-dimensional and/orthree-dimensional capturing devices. The sensor data 118 can includeimage data, radar data, LIDAR data, and/or other data acquired by thevehicle sensor(s) 116. For example, the vehicle sensor(s) 116 caninclude a front-facing RGB camera mounted on top of the vehicle 102 andthe sensor data 118 can include an RGB image depicting the surroundingenvironment of the vehicle 102. In addition, or alternatively, thevehicle sensor(s) 116 can include one or more LIDAR sensor(s) and thesensor data 118 can include one or more sparse sets of LIDARmeasurements. Moreover, the vehicle 102 can also include other sensorsconfigured to acquire data associated with the vehicle 102. For example,the vehicle 102 can include inertial measurement unit(s), wheel odometrydevices, and/or other sensors. In some implementations, the sensor data118 and/or map data 132 can be processed to select one or more targettrajectories for traversing within the surrounding environment of thevehicle 102.

In addition to the sensor data 118, the autonomy computing system 130can retrieve or otherwise obtain map data 132. The map data 132 canprovide static world representations about the surrounding environmentof the vehicle 102. For example, in some implementations, a vehicle 102can exploit prior knowledge about the static world by building verydetailed maps (HD maps) that represent not only the roads, buildings,bridges, and landmarks, but also traffic lanes, signs, and lights tocentimeter accurate three-dimensional representations. Moreparticularly, map data 132 can include information regarding: theidentity and location of different roadways, road segments, buildings,or other items or objects (e.g., lampposts, crosswalks, curbing, etc.);the location and directions of traffic lanes (e.g., the location anddirection of a parking lane, a turning lane, a bicycle lane, or otherlanes within a particular roadway or other travel way and/or one or moreboundary markings associated therewith); traffic control data (e.g., thelocation and instructions of signage, traffic lights, or other trafficcontrol devices); the location of obstructions (e.g., roadwork,accidents, etc.); data indicative of events (e.g., scheduled concerts,parades, etc.); and/or any other data that provides information thatassists the vehicle 102 in comprehending and perceiving its surroundingenvironment and its relationship thereto.

The vehicle 102 can include a positioning system 114. The positioningsystem 114 can determine a current position of the vehicle 102. Thepositioning system 114 can be any device or circuitry for analyzing theposition of the vehicle 102. For example, the positioning system 114 candetermine a position by using one or more of inertial sensors (e.g.,inertial measurement unit(s), etc.), a satellite positioning system,based on IP address, by using triangulation and/or proximity to networkaccess points or other network components (e.g., cellular towers, WiFiaccess points, etc.) and/or other suitable techniques. The position ofthe vehicle 102 can be used by various systems of the vehicle computingsystem 110 and/or provided to a remote computing system. For example,the map data 132 can provide the vehicle 102 relative positions of theelements of a surrounding environment of the vehicle 102. The vehicle102 can identify its position within the surrounding environment (e.g.,across six axes, etc.) based at least in part on the map data 132. Forexample, the vehicle computing system 110 can process the sensor data118 (e.g., LIDAR data, camera data, etc.) to match it to a map of thesurrounding environment to get an understanding of the vehicle'sposition within that environment.

The autonomy computing system 130 can include a perception system 140, aprediction system 150, a motion planning system 160, and/or othersystems that cooperate to perceive the surrounding environment of thevehicle 102 and determine a motion plan for controlling the motion ofthe vehicle 102 accordingly.

For example, the autonomy computing system 130 can obtain the sensordata 118 from the vehicle sensor(s) 116, process the sensor data 118(and/or other data) to perceive its surrounding environment, predict themotion of objects within the surrounding environment, and generate anappropriate motion plan through such surrounding environment. Theautonomy computing system 130 can communicate with the one or morevehicle control systems 120 to operate the vehicle 102 according to themotion plan.

The vehicle computing system 100 (e.g., the autonomy computing system130) can identify one or more objects that are proximate to the vehicle102 based at least in part on the sensor data 118 and/or the map data132. For example, the vehicle computing system 110 (e.g., the perceptionsystem 140) can process the sensor data 118, the map data 132, etc. toobtain perception data 142. The vehicle computing system 110 cangenerate perception data 142 that is indicative of one or more states(e.g., current and/or past state(s)) of a plurality of objects that arewithin a surrounding environment of the vehicle 102. For example, theperception data 142 for each object can describe (e.g., for a giventime, time period) an estimate of the object's: current and/or pastlocation (also referred to as position); current and/or pastspeed/velocity; current and/or past acceleration; current and/or pastheading; current and/or past orientation; size/footprint (e.g., asrepresented by a bounding shape); class (e.g., pedestrian class vs.vehicle class vs. bicycle class); the uncertainties associatedtherewith, and/or other state information. The perception system 140 canprovide the perception data 142 to the prediction system 150, the motionplanning system 160, and/or other system(s).

The prediction system 150 can be configured to predict a motion of theobject(s) within the surrounding environment of the vehicle 102. Forinstance, the prediction system 150 can generate prediction data 152associated with such object(s). The prediction data 152 can beindicative of one or more predicted future locations of each respectiveobject. For example, the prediction system 150 can determine a predictedmotion trajectory along which a respective object is predicted to travelover time. A predicted motion trajectory can be indicative of a paththat the object is predicted to traverse and an associated timing withwhich the object is predicted to travel along the path. The predictedpath can include and/or be made up of a plurality of way points. In someimplementations, the prediction data 152 can be indicative of the speedand/or acceleration at which the respective object is predicted totravel along its associated predicted motion trajectory. The predictionsystem 150 can output the prediction data 152 (e.g., indicative of oneor more of the predicted motion trajectories) to the motion planningsystem 160.

The vehicle computing system 110 (e.g., the motion planning system 160)can determine a motion plan 162 for the vehicle 102 based at least inpart on the perception data 142, the prediction data 152, and/or otherdata.

A motion plan 162 can include vehicle actions (e.g., planned vehicletrajectories, speed(s), acceleration(s), other actions, etc.) withrespect to one or more of the objects within the surrounding environmentof the vehicle 102 as well as the objects' predicted movements. Forinstance, the motion planning system 160 can implement an optimizationalgorithm, model, etc. that considers cost data associated with avehicle action as well as other objective functions (e.g., costfunctions based on speed limits, traffic lights, etc.), if any, todetermine optimized variables that make up the motion plan 162. Themotion planning system 160 can determine that the vehicle 102 canperform a certain action (e.g., pass an object, etc.) without increasingthe potential risk to the vehicle 102 and/or violating any traffic laws(e.g., speed limits, lane boundaries, signage, etc.). For instance, themotion planning system 160 can evaluate one or more of the predictedmotion trajectories of one or more objects during its cost data analysisas it determines an optimized vehicle trajectory through the surroundingenvironment. The motion planning system 160 can generate cost dataassociated with such trajectories. In some implementations, one or moreof the predicted motion trajectories may not ultimately change themotion of the vehicle 102 (e.g., due to an overriding factor). In someimplementations, the motion plan 162 may define the vehicle's motionsuch that the vehicle 102 avoids the object(s), reduces speed to givemore leeway to one or more of the object(s), proceeds cautiously,performs a stopping action, etc.

The motion planning system 160 can be configured to continuously updatethe vehicle's motion plan 162 and a corresponding planned vehicle motiontrajectory. For example, in some implementations, the motion planningsystem 160 can generate new motion plan(s) for the vehicle 102 (e.g.,multiple times per second). Each new motion plan can describe a motionof the vehicle 102 over the next planning period (e.g., next severalseconds). Moreover, a new motion plan may include a new planned vehiclemotion trajectory. Thus, in some implementations, the motion planningsystem 160 can continuously operate to revise or otherwise generate ashort-term motion plan based on the currently available data. Once theoptimization planner has identified the optimal motion plan (or someother iterative break occurs), the optimal motion plan (and the plannedmotion trajectory) can be selected and executed by the vehicle 102.

The vehicle computing system 110 can cause the vehicle 102 to initiate amotion control in accordance with at least a portion of the motion plan162. A motion control can be an operation, action, etc. that isassociated with controlling the motion of the vehicle. For instance, themotion plan 162 can be provided to the vehicle control system(s) 120 ofthe vehicle 102. The vehicle control system(s) 120 can be associatedwith a vehicle controller (e.g., including a vehicle interface) that isconfigured to implement the motion plan 162. The vehicle controller can,for example, translate the motion plan into instructions for theappropriate vehicle control component (e.g., acceleration control, brakecontrol, steering control, etc.). By way of example, the vehiclecontroller can translate a determined motion plan 162 into instructionsto adjust the steering of the vehicle 102 “X” degrees, apply a certainmagnitude of braking force, etc. The vehicle controller (e.g., thevehicle interface) can help facilitate the responsible vehicle control(e.g., braking control system, steering control system, accelerationcontrol system, etc.) to execute the instructions and implement themotion plan 162 (e.g., by sending control signal(s), making thetranslated plan available, etc.). This can allow the vehicle 102 toautonomously travel within the vehicle's surrounding environment.

As shown in FIG. 1 , the vehicle computing system 110 can include amotion planner 164 that is configured to generate motion plans 162and/or assist in generating motion plans 162. Motion planner 164 can beconfigured to generate a target trajectory for the autonomous vehicle inexample embodiments. Motion planner 164 can receive sensor data and mapdata associated with the environment external to the autonomous vehicle,and process the sensor data and map data to generate a target trajectoryfor the autonomous vehicle. In some examples, such as that depicted inFIG. 1 , motion planner 164 can be configured as a separate system fromperception system 140, prediction system 150, and/or motion planningsystem 160. Motion planner 164 may receive sensor data 118 and map data132 and generate one or more target trajectories that are provided tomotion planning system 160 in order to generate motion plans 162. Inanother example, motion planner 164 can be integrated within motionplanning system 160. In some instances, motion planner 164 may beconfigured to perform perception and prediction tasks such that aseparate perception system 140 and prediction system 150 can be omitted.For example, motion planner 164 can include a motion planning modelconfigured to perform the functions of perception system 140, predictionsystem 150, and motion planning system 160. In some examples, motionplanner 164 can include a perception system 140 and/or prediction system150. In some examples, motion planner 164 can generate intermediaterepresentations such as perception data 142 and prediction data 152, aswell as one or more target trajectories for the autonomous vehicle 102.Vehicle computing system 110 of FIG. 1 can be configured to receiveoutput(s) from motion planner 164. For example, output(s) can beprovided from the motion planner 164 to motion planning system 160and/or vehicle control system 120.

In some examples, motion planner 164 can include an end-to-end learnableand interpretable motion planning model. The machine-learned motionplanning model can be configured to receive map data and sensor datasuch as raw image data, light-detection and ranging (LIDAR) data, RADARdata, etc. and to directly generate a motion plan 162 for an autonomousvehicle based on the raw sensor data. The motion planner in someexamples may include a single neural network that takes raw sensor dataand dynamic map data as an input, and predicts a cost map for motionplanning as an output. A single machine-learned motion planning modelmay be configured in an end-to-end fashion to receive raw sensor dataand generate one or more motion plans 162 for the autonomous vehicle.The motion planner may be configured to generate a target trajectory ofa motion plan for an autonomous vehicle, as well as one or moreintermediate representations of the environment external to theautonomous vehicle based on the sensor data. In this manner, a motionplanning model may be trained end-to-end to provide motion planning foran autonomous vehicle based on raw sensor data, while also providingintermediate representations such as perception data 142 and predictiondata 152 as an interpretable output. Accordingly, the motion planner mayprovide an end-to-end driving approach that is optimized for the motionplanning task, while also providing intermediate representations thatcan be accessed to improve the effectiveness of the model, such as bytraining to optimize the intermediate representations for motionplanning.

Although many examples are described herein with respect to autonomousvehicles, the disclosed technology is not limited to autonomousvehicles. In fact, any object capable of collecting sensor data and mapdata can utilize the technology described herein for generating a targettrajectory. For example, a non-autonomous vehicle may utilize aspects ofthe present disclosure to generate a target trajectory for an operatorof the non-autonomous vehicle, notify the vehicle operator of the targettrajectory, and take precautionary measures based on the identifiedtarget trajectory. Likewise, a smart phone with one or more cameras, arobot, augmented reality system, and/or another type of system canutilize aspects of the present disclosure to generate targettrajectories.

FIG. 2 depicts a block diagram of an example motion planner 164 inaccordance with example embodiments of the present disclosure. Motionplanner 164 can be configured to determine a target trajectory of amotion plan for an autonomous vehicle (e.g., vehicle 102) based at leastin part on sensor data 118 from sensors 116 and/or map data 132. Themotion planner 164 can detect and determine information about thecurrent locations of objects and/or predicted future locations and/ormoving paths of proximate objects. The motion planner 164 can determinea motion plan for the autonomous vehicle (e.g., vehicle 102) that bestnavigates the autonomous vehicle (e.g., vehicle 102) along a determinedtravel route relative to the objects at such locations. The motionplanning system 160 then can provide the selected motion plan to avehicle control system 138 that controls one or more vehicle controls(e.g., actuators or other devices that control gas flow, steering,braking, etc.) to execute the selected motion plan.

Motion planner 164 includes one or more machine learned motion planningmodel(s) 202. A motion planning model 202 can be configured to receivesensor data 118 and map data 132, and generate one or more intermediaterepresentations 230 and a cost volume 232 based on the sensor data andmap data. Additionally, motion planning model can be configured toobtain one or more potential trajectories 222 (also referred to astrajectory proposals) for the autonomous vehicle. In some examples,motion planning model 202 can generate a set of potential trajectoriesbased on sensor data and/or map data associated with the autonomousvehicle. Motion planning model 202 can select a target trajectory 234from the plurality of potential trajectories 222 for the autonomousvehicle. In other examples, motion planning model 202 can optimize arandomly or otherwise sampled trajectory to generate a targettrajectory.

Motion planning model 202 may include a backbone network 210 and atrajectory generator 220. Backbone network 210 can receive sensor data118 and map data 132 as input(s) in some examples. Backbone network 210can provide as output(s) one or more intermediate representations 230 ofobjects, as well as a cost volume 232 associated with locations in theenvironment external to the autonomous vehicle. The intermediaterepresentations 230 and cost volume(s) 232 can be based at least in parton the sensor data 118 and map data 132. The sensor data can includeLIDAR data in some examples. Additionally or alternatively, RADAR data,image data, and/or other sensor data indicative of an environmentexternal to or otherwise surrounding an autonomous vehicle can be used.

The backbone network 210 can be configured to generate one or morefeature map(s) 212 from the sensor data 118 and/or map data 132. Forexample, feature map 212 may include one or more tensors associated withsensor data 118 and one or more tensors associated with map data 132.Feature map 212 can be provided as an input to one or more convolutionalneural networks 214. Convolutional neural networks 214 can generate theone or more intermediate representations 230 and the one or more costvolumes 232. Feature manager 216 can manage the output(s) of thebackbone network 210 to provide intermediate representations 230 and thecost volumes 232.

In some implementations, the machine-learned motion planning model maybe configured to generate one or more interpretable immediaterepresentations based on the sensor data, such as three-dimensionalobject detections and/or motion predictions for detected objects. Theintermediate representations can include bounding boxes of objectsexternal to the autonomous vehicle for current and/or future timestepsin some examples. The intermediate representations can additionally oralternatively include predictions associated with one or more objects.For example, a predicted position and/or motion can be provided. Apredicted motion may include a predicted object path, as well as avelocity and/or acceleration associated with the object. Theseintermediate representations 230 are provided as an independent outputof the backbone network 210 in some examples. In this manner, theintermediate representations 230 are interpretable. For example, theintermediate representations may be accessed by a training computingsystem during training of the machine-learned motion planning model 202for generating target trajectories.

The cost volume 232 can define a cost of a plurality of positions orlocations that an autonomous vehicle may take within a planning horizon.The cost volume may represent a measure of the desirability of eachlocation or position that the autonomous vehicle may take. For example,the cost or other measure for a particular position may be indicative ofa likelihood of safety or other parameter associated with the vehicletaking that position. A lower cost may be indicative of a lowerlikelihood of collision with another object at a particular location,for example, whereas a higher cost may be indicative of a higherlikelihood of collision with another object at the particular location.

The trajectory generator 220 can generate a target trajectory for theautonomous vehicle based at least in part on the cost volume. Forinstance, the trajectory generator can optimize a sampled trajectoryusing the cost volume to generate a target trajectory. The trajectorygenerator can include an optimizer that optimizes a sampled trajectoryby minimizing the cost computed for the trajectory using the costvolume. In another examples, the trajectory generator can sample a setof diverse physically possible trajectories for the autonomous vehicleand compute a score for a set of potential trajectories 222 based on thecost volume. The model can select a target trajectory 234 for theautonomous vehicle based on the trajectory score(s) 226 for a set ofpotential trajectories 222. For example, the model can select thepotential trajectory having the lowest trajectory score (also referredto as minimum cost) as the target trajectory for the autonomous vehicle.Trajectory generator 220 can receive or access cost volume 232 tocompute trajectory score 226 for each potential trajectory 222.Trajectory generator 220 can generate a plurality of timestamp costindices 224 for each potential trajectory 222. For example, each timestep cost index can represent a cost from different filters of the costvolume 232. Trajectory generator 220 can sum together the plurality oftimestep cost indices for a potential trajectory 222 in order to computetrajectory score 226.

In this manner, the machine-learned motion planning model is able toutilize a nonparametric cost volume that can capture the uncertainty andmultimodality in possible AV trajectories, such as the uncertainty andmultimodality in changing a lane versus staying in a lane. By utilizinga motion planner that includes an end-to-end machine-learned motionplanning model, the motion planning system is capable of handlingcomplex urban scenarios that include traffic light handling, yielding,and interactions with multiple road users.

FIG. 3 depicts additional details of a machine learned motion planningmodel 202 including a backbone network 210 and a trajectory generator220 in accordance with example embodiments of the disclosed technology.In the example depicted in FIG. 3 , the machine learned motion planningmodel 22 can be configured as an interpretable and end-to-end motionplanner. Motion planning model 202 can take as input raw sensor data andmap data, and generate as independent outputs of the model one or moreintermediate representations 230, one or more cost volumes 232, and oneor more target trajectories 234 directly from the raw sensor data andmap data.

Backbone network 210 can take as input LIDAR data 302 and highdefinition map data 304. Backbone network 210 can generate a feature map212 including one or more map features 306 and one or more LIDARfeatures 308. In some examples, the one or more map features 306 mayeach include an M-channel tensor generated from high definition map datathat contains information about the semantics of the environmentexternal to the autonomous vehicle. For example, semantic data such asthe location of lane, lane boundaries shape (e.g., solid, dashed) andthe location of signs such as stop signs and the like may be included inmap data. The one or more LIDAR features 308 may each include athree-dimensional tensor based on a plurality of observational sweeps byone or more LIDAR sensors. The LIDAR space can be rasterized into a 3Doccupancy grid, where each voxel has a binary value indicating whetherit contains a LIDAR point, thereby resulting in the three-dimensionaltensor.

The one or more map features 306 and the one or more LIDAR features 308can be provided as a feature map to a convolutional neural network 214.Convolutional neural network 214 may include a plurality convolutionallayers 310. The plurality of convolutional layers 310 may include one ormore convolution layers and one or more deconvolution layers.Convolution layers 310 may generate one or more intermediaterepresentations 230 and one or more cost volumes 232 which are providedas an output of the backbone network by feature manager 216.

The one or more intermediate representations 230 may include boundingboxes 312 representing an object such as other actors in the environmentexternal to the autonomous vehicle. The bounding boxes may represent anobject for future timesteps, such as to represent a perception output ofthe machine learned motion planning model 202. The motion planning modelcan generate one or more interpretable intermediate representations 230in the form of three-dimensional object detections and/or future motionforecasts over the planning horizon for the three-dimensional objectdetections. In some examples, the interpretable intermediaterepresentations are provided as a first output of the motion planningmodel.

The motion planning model can additionally generate a space-time costvolume 232 that represents a cost associated with all locations that theautonomous vehicle can take within the planning horizon. The space-timecost volume can be generated as a second output of the motion planningmodel in some examples. Convolution layers 310 can generate cost volume232 in some examples. Cost volume 232 can be a space-time cost volumehaving a dimension of H×W×T, where H represents the height, W representsa width, and T represents a filter number for planning over a horizonwith a number of timesteps equal to the filter number.

One or more potential trajectories can be scored and/or optimized usingthe learned cost volume. In some examples, the trajectory generator canoptimize a sampled trajectory using the cost volume. For example, thetrajectory generator can include an optimizer that optimizes a sampledtrajectory by minimizing the cost computed for the trajectory using thecost volume. In another examples, a series of potential trajectories 222or trajectory proposals can be scored using the learned cost volume 232.The motion planning model 202 can select a target trajectory 234 basedon the score for each potential trajectory. For example, the motionplanning model can select the target trajectory having the lowesttrajectory score or the minimum cost. The target trajectory,representing the potential trajectory having the minimum cost, can beprovided as a third output of the motion planning model 202 in someexamples. For a potential trajectory, trajectory generator 220 generatesa timestamp cost index based at least in part on cost volume 232. Foreach potential trajectory, its cost is indexed from the differentfilters of the cost volume and summed together to generate a trajectoryscore 226 for each potential trajectory 222. For instance, a number oftimestamp cost indices 224 will be generated that is equal to the filternumber Tin some examples. As illustrated in FIG. 3 , trajectorygenerator 220 generates a timestep cost index C⁰ through c^(T−1) foreach filter of the cost volume 232. In some examples, each timestep costindex corresponds to one timestep of cost volume 232 over a potentialtrajectory.

According to some examples, machine learned motion planning model 202can be configured as a deep structured interpretable neural motionplanner configured for deep structured planning. More formally, considerthat s={s⁰, s¹, . . . , s^(T−1)} can be a trajectory spanning over Ttimesteps into the future. A location in bird's eye view (BEV) at timestep t can be represented by s. The planning problem can then beformulated as a deep structured minimization problem as set forth inEquation 1:

$\begin{matrix}{s^{*} = {\underset{s}{\arg\min}{\sum}_{t}{c^{t}\left( s^{t} \right)}}} & {{Equation}1}\end{matrix}$

In Equation 1, c^(t) can represent a learned cost volume indexed attimestep T. The cost volume can be a two-dimensional tensor with thesame size as a region of interest. This minimization can be approximatedby sampling a set of physically valid trajectories s, and picking theone with minimum cost. In some examples, machine learned motion planningmodel 202 can employ a convolutional network backbone to compute thecost volume. It can first extract features from both LIDAR data and mapdata, and then feed the feature map into two branches of convolutionlayers that output three-dimensional detection and motion forecasting,as well as a planning cost volume, respectively.

FIG. 4 depicts a block diagram illustrating additional details of abackbone network 210 in accordance with example embodiments of thepresent disclosure. Backbone network 210 includes one or moreconvolutional neural networks 214 configured to generate intermediaterepresentations and a cost volume based on autonomous sensor datareceived from one or more sensors of an autonomous vehicle. A featuremap 212 is provided as an input to the convolutional neural network 214.In this example, feature map 212 includes a three-dimensional LIDARtensor 404 and an M-channel map tensor 406.

The three-dimensional LIDAR tensor 404 can be generated based at leastin part on LIDAR data received from one or more LIDAR sensors of theautonomous vehicle. Backbone network 210 can receive LIDAR point cloudsas inputs, such as can be captured by one or more LIDAR sensors mountedon the autonomous vehicle. A number T (e.g., 10) of consecutive sweepsby the LIDAR sensor can be used as observations, in order to infer themotion of external actors or objects. For the LIDAR sweeps, the backbonenetwork can correct for motion of the autonomous vehicle, and can bringthe LIDAR point clouds from a past number of frames (e.g., 10 frames) tothe same coordinate system centered at the autonomous vehicle's currentlocation. To make the input data amenable to standard convolutions, theLIDAR space can be rasterized into a 3D occupancy grid, where each voxelincludes a binary value indicating whether it contains a LIDAR point.This can result in a three-dimensional tensor of a size H×W×(ZT′), whereZ, H, W represents the height and x-y spatial dimensions, respectively.In some examples, timesteps along the Z-dimension can be concatenated,thus avoiding three-dimensional convolutions which are memory andcomputationally intensive. In other examples, however, concatenation maybe omitted.

The M-channel map tensor 406 can be generated based at least in part onhigh-definition map data received by the backbone network 210. Access toa map can enable accurate motion planning, such as by permitting theautonomous vehicle to drive according to traffic rules (e.g., stop at ared light, follow the lane, change lanes only when allowed). Towardsthis goal, the backbone network can exploit high-definition maps thatcontain information about the semantics of the scene, such as lanelocation, the boundary type (e.g., solid, dashed) and the location ofstop signs or other signs. In some examples, the map can be rasterizedto form an M-channel tensor, where each channel represents a differentmap element. For example, map elements can include roads, intersections,lanes, lane boundaries, traffic lights, etc. The resulting feature map212 including three-dimensional LIDAR tensor 404 and M-channel maptensor 406 can be provided as an input to the convolutional neuralnetwork 214. The resulting feature map 212 can have a size H×W×(ZT′+M).

In some examples, backbone network 210 can include a plurality of blocks(e.g., five blocks), were each block has a number of two-dimensionalconvolutional layers (e.g., {2, 2, 3, 6, 5}) with a filter number (e.g.,{32, 64, 128, 256, 256}), a filter size (e.g., 3×3), and a stride (e.g.,1). A number of max-pool layers can be provided after each of a numberof the blocks (e.g., after each of the first 3 blocks). A multiscalefeature map can be generated after the first 4 blocks. For example, thefeature maps from each of the first 4 blocks can be resized to apercentage (e.g. one quarter size) of the input size, and can beconcatenated together in order to increase the effective receptivefield. The multiscale features can then be fed into the final block. Insome examples, the backbone network can have a downsampling rate such asof about 4. Other block architectures can be used.

Convolutional neural network 214 can include a perception header 410(also referred to as a perception model head) and a cost volume header420 (also referred to as a cost volume model head) in exampleembodiments. Perception header 410 can include two components formed ofconvolution layers, one for classification and one for regression. Forexample, the perception header 410 may include a classification branch412 that includes one or more classification layers and a regressionbranch 414 that includes one or more regression layers. Theclassification branch 412 may generate intermediate representations suchas bounding boxes 430 indicating the probability of an object at ananchor location. To reduce the variance of regression targets, multiplepredefined anchor boxes a_(i,j) ^(k) can be employed at each feature maplocation, where subscript i, j denotes the location on the feature mapand k indexes over the anchors. By way of example, there can be 12anchors at each location, with different sizes aspect ratios, andorientation. The classification branch can output a score p_(i,j) ^(k)for each anchor indicating the probability of a vehicle at each anchor'slocation. Regression branch 414 may generate intermediaterepresentations such as motion forecasts 432 for an object over a numberof timesteps. Regression branch 414 can also output regression targetsfor each anchor a_(i,j) ^(k) at different time steps. This can includelocalization offset l_(x) ^(t), l_(y) ^(t), size s_(w) ^(t), s_(h) ^(t),and heading angle a_(sin) ^(t), a_(cos) ^(t).

The superscript T stands for timeframe, ranging from 0 (present) to T−1into the future. Regression can be performed every time step, thusproducing motion forecasting for each vehicle.

Cost volume header 420 can include one or more deconvolution layers 422and one or more convolution layers 424 in accordance with exampleembodiments. To produce a cost volume c at the same resolution as a birdeye view input, two deconvolution layers can be applied on the backboneoutput with a filter number (e.g., {128, 64}), filter size (e.g., 3×3),and stride (e.g., 1). A final convolution layer can be applied with afilter number T, which corresponds to planning horizon. Each filter cangenerate a cost volume c^(t) for a future time step t. This allows themachine-learned motion planning model 202 to evaluate the cost of anytrajectory s by simply indexing in the cost volume c. In some examples,the cost volume value can be clipped (e.g., between −1000 to +1000)after the network. Applying such bounds can prevent the cost value fromshifting arbitrarily, and can make tuning hyperparameters easier.

Cost volume header 420 can compute a corresponding three-dimensional(3D) cost volume 434 given input LIDAR sweeps and an HD map, byfeedforward convolutional operations as described above. Referring againto FIG. 3 , trajectory generator 220 can compute a final targettrajectory 234 by minimizing Equation 1 in some examples. Such anoptimization may be considered non-polynomial hard. Accordingly,sampling can be employed by trajectory generator 220 to obtain alow-cost trajectory. A wide variety of diverse trajectories that can beexecuted by the autonomous vehicle can be sampled by trajectorygenerator 220, and a final output can be provided that includes thetrajectory with the minimum cost according to the learned cost volume434. Potential trajectories can be efficiently sampled that arephysically possible and the cost volume can be evaluated efficiently insome embodiments.

FIG. 5 depicts a flowchart diagram illustrating an example method forgenerating a target trajectory using a machine-learned motion planningmodel according to example embodiments of the present disclosure. One ormore portions of method 600 (and the other methods described herein suchas method 630 of FIG. 6 and/or method 650 of FIG. 8 ) can be implementedby one or more computing devices such as, for example, one or morecomputing devices of vehicle computing system 100 of FIG. 1 , operationscomputing system 200 of FIG. 2 , or computing system 1000 of FIG. 10 .One or more portions of method 600 can be implemented as an algorithm onthe hardware components of the devices described herein (e.g., as inFIGS. 1, 2, 3, 4 and 10 ) to, for example, generate a target trajectoryfor an autonomous vehicle. In example embodiments, method 600 may beperformed by a motion planner 164 and/or motion planning system 160implemented using one or more computing devices of a vehicle computingsystem (e.g., 200).

At (602), sensor data such as LIDAR data, RADAR data, image data, etc.can be obtained from one or more autonomous vehicle sensors onboard anautonomous vehicle. At (604), map data such as high-definition map dataincluding semantic information about an environment external to theautonomous vehicle can be obtained. The LIDAR data and the map data canbe obtained by vehicle computing system 110 in example embodiments.

At (606), the vehicle computing system 110 can input the sensor data andthe map data into one or more machine learned motion planning models(e.g., motion planning model 202) in accordance with example embodimentsof the present disclosure.

At (608), the vehicle computing system 110 can receive one or moreintermediate representations of objects in the environment external tothe autonomous vehicle. The one or more intermediate representations canbe generated by the backbone network of the motion planning model inexample embodiments. At (610), the vehicle computing system 110 cangenerate one or more cost volumes that are indicative of a costassociated with each of a plurality of future locations of theautonomous vehicle within a planning horizon. The cost volume(s) can begenerated by the backbone network of the motion planning model inexample embodiments. In some implementations, the backbone network cangenerate a feature map based on the sensor data and the map data andprovide the feature map as input to one or more convolutional neuralnetworks configured to generate the intermediate representations and thecost volume(s).

At (612), one or more potential trajectories for the autonomous vehiclecan be obtained. In some examples, the trajectory generator can sample aplurality of possible locations in Cartesian space to generate a set ofpotential trajectories. The set of potential trajectories may representless than all possible trajectories for the autonomous vehicle in orderto provide a more efficient trajectory generation process. Variousconstraints such as a speed constraint, an acceleration constraint,and/or a turning angle constraint can be used in order to generate theset of sampled trajectories. In some examples, a trajectory generatorcan apply a dynamical model to generate the set of potentialtrajectories according to at least one of the speed constraint, theacceleration constraint, or the turning angle constraint. In someexamples, a single potential trajectory can be obtained, such as byrandom sampling.

At (614), trajectory scores are generated for at least one potentialtrajectory using the cost volume received at (610). At (616), a targettrajectory can be selected based at least in part on the trajectoryscore(s) generated at (614). In some examples, the trajectory generator220 can select the potential trajectory having the minimum cost as thetarget trajectory at (616). In other examples, the model can select thetarget trajectory by optimizing a potential trajectory, such as arandomly sampled trajectory, based on the cost volume generated by thebackbone network.

FIG. 6 depicts a flowchart diagram illustrating an example method 630 ofgenerating a set of potential trajectories using a machine-learnedmotion planning model according to example embodiments of the presentdisclosure. One or more portions of method 630 can be implemented as analgorithm on the hardware components of the devices described herein(e.g., as in FIGS. 1, 2, 3, 4 and 10 ) to, for example, generate a setof potential trajectories for an autonomous vehicle. In exampleembodiments, method 630 may be performed by motion planning system 160implemented using one or more computing devices of a vehicle computingsystem (e.g., 110).

At (632), the motion planning system imposes a dynamical model toenforce one or more real-world constraints on the autonomous vehiclepath. Sampling a trajectory as a set of points in Cartesian space (x,y)∈

² may be an inefficient solution. For example, a vehicle may beincapable of executing all possible sets of points in Cartesian space.This can be due to the physical limits of the vehicle, such as speed,acceleration, and turning angle. To consider these real-worldconstraints, trajectory generator 220 can impose that the vehicle shouldfollow the dynamical model in example embodiments. The dynamical modelcan represent physical limits such as speed, acceleration, and turningangle. A dynamical model such as a bicycle model or other model of amoving object can be used in planning for autonomous vehicles. Apotential trajectory 222 can be defined by the combination of a spatialpath (a curve in the 2D plane) and the velocity profile (how fast thevehicle is going along this path). The model can apply the curvature κof the vehicle's path as approximately proportional to the steeringangle ϕ (angle between the front will and the vehicle):

${\kappa = {\frac{2{\tan(\phi)}}{L} \approx \frac{2\phi}{L}}},$

where L is the distance between the front and rear axles of theautonomous vehicle.

At (634), a planar curve is used to represent the 2D path of theautonomous vehicle. In some examples the planar curve is a Clothoidcurve, also known as an Euler spiral or Cornu spiral. A planar curvesuch as the Clothoid curve can be used to represent the 2D path of theautonomous vehicle. FIG. 7 depicts one example of a trajectoryrepresentation in accordance with example embodiments. A series of stepsfor defining a trajectory based on a Clothoid curve is depicted.

The curvature κ of a point on the curve can be proportional to thepoint's distance along the curve from the reference point (i.e., KEequals). Considering a model such as the bicycle model, the linearcurvature characteristic corresponds to steering the front wheel anglewith constant angular velocity. The canonical form of a Cloithoid curvecan be defined as set forth in Equations 2, 3, and 4.

$\begin{matrix}{{s(\xi)} = {s_{0} + {a\left\lbrack {{{C\left( \frac{\xi}{a} \right)}T_{0}} + {{S\left( \frac{\xi}{a} \right)}N_{0}}} \right\rbrack}}} & {{Equation}2}\end{matrix}$ $\begin{matrix}{{S(\xi)} = {\int_{0}^{\xi}{{\sin\left( \frac{\pi u^{2}}{2} \right)}{du}}}} & {{Equation}3}\end{matrix}$ $\begin{matrix}{{C(\xi)} = {\int_{0}^{\epsilon}{{\cos\left( \frac{\pi u^{2}}{2} \right)}du}}} & {{Equation}4}\end{matrix}$

In the above equations, s(ξ) defines a Cloithoid curve on a 2D plane,indexed by the distance ξ to reference point s₀. The notation a is ascaling factor, and T₀ and N₀ are the tangent and normal vector of thiscurve at point s₀. S(ξ) and C(ξ) may be referred to as the Fresnelintegral, and can be efficiently computed.

At (636), a longitudinal velocity is defined that specifies theautonomous vehicle motion along the path. In order to fully define atrajectory, a longitudinal velocity (e.g., velocity profile) can bedefined. The longitudinal velocity can specify the autonomous vehiclemotion along the path s(ξ): {dot over (ξ)}(t)={umlaut over (ξ)}t+{dotover (ξ)}₀, where ξ₀ is the initial velocity of the autonomous vehicleand {dot over (ξ)} is a constant forward acceleration. By combiningthese velocities and accelerations, a trajectory points s can beobtained as set forth in equation 1.

At (638), a path is sampled based on a scaling factor to establish theshape of the planar curve. Where a Cloithoid curve is used, sampling thepath can correspond to sampling the scaling factor a as set forth inEquation 2. By way of example, if a driving speed limit (e.g., 15 m/s)is considered, the scaling factor a can be sampled in a range (e.g. 6 to80 m). Once the scaling factor a is sampled, the shape of the Cloithoidcurve can be fixed.

At (640) the motion planning system can find a corresponding position onthe planar curve using the steering angle of the autonomous vehicle. Insome examples, the initial steering angle (e.g., curvature) of theautonomous vehicle can be used to find the corresponding position on thecurve. It is noted that Cloithoid curves may not handle circle andstraight-line trajectories well. Accordingly, sampling can be usedseparately in some embodiments. For instance, the probability of usingstraight-line, circle, and Cloithoid curves can be 0.5, 0.25, and 0.25,respectively, in some examples. In some cases, a single Cloithoidsegment can be used to specify the path of an autonomous vehicle. Thismay be sufficient for a short planning horizon.

At (642), accelerations can be sampled to specify an autonomous vehiclevelocity profile. In some examples, constant accelerations can besampled as a range which specifies the autonomous vehicle's velocityprofile. For instance, constant accelerations can be sampled, rangingfrom −5 m/s2 to 5 m/s2, which specify the autonomous vehicle's velocityprofile.

At (644), the sampled planar curves and the velocity profiles can becombined. At (646), the trajectories can be projected to discrete timesteps and the corresponding waypoints for which to evaluate the learnedcost volume can be obtained. The waypoints can then be used to generatea potential trajectory. A trajectory score for the potential trajectorycan be determined by evaluating the cost volume for the waypoints.

A machine learned motion planning model in example embodiments can betrained end-to-end with a multi task objective. The ultimate goal ortask may be to plan a safe trajectory while following the rules oftraffic. The model can be trained to understand where obstacles are andwhere they will be in the future order to avoid collisions. Multitasktraining can be used, with supervision from detection, motionforecasting, as well as human driven trajectories for the autonomousvehicle. In some examples, supervision for the cost volume may not beavailable. Accordingly, a max-margin loss can be adopted to push thenetwork to learn to discriminate between “good” and “bad” trajectories.An overall loss function can be defined as set forth in equation 5.

=

_(perception)+β

_(planning)   Equation 5

The multitask loss set forth by the function in Equation 5 not onlydirects the network to extract useful features, but also to make thenetwork output interpretable results. This may be beneficial forautonomous vehicles as it can facilitate an understanding of the failurecases to thereby improve the system. For instance, the intermediaterepresentations can be accessed as an output of the model to understandhow the system is generating classifications and predictions, forexample.

The overall loss function can include a perception loss component and aplanning loss component. The planning loss component can represent aplanning loss. The planning loss can encourage the minimum cost plan tobe similar to the trajectory performed by a human demonstrator in someexamples. The planning loss can be sparse in some examples as a groundtruth trajectory may only occupy a small portion of the space. As aconsequence, training with this loss alone may be slow and presentchallenges. Accordingly, a perception loss component can be used thatencourages the intermediate representations to produce accurate 3Ddetections and motion forecasting. This can enable the interpretabilityof the intermediate representations and can enable faster training inexample embodiments.

FIG. 8 depicts a flowchart diagram illustrating an example method 650 oftraining a machine-learned motion planning model in accordance withexample embodiments of the present disclosure. One or more portions ofmethod 650 can be implemented as an algorithm on the hardware componentsof the devices described herein (e.g., as in FIGS. 1, 2, 3, 4 and 10 )to, for example, train a machine-learned motion planning model togenerate a target trajectory for an autonomous vehicle based on sensordata and map data. In example embodiments, method 650 may be performedby motion planning system 160 implemented using one or more computingdevices of a vehicle computing system (e.g., 110).

At (652), a perception loss component can be defined for the machinelearned motion planning model. The perception loss component can includea classification loss component and a regression loss component inexample embodiments. The classification loss can be used fordistinguishing a vehicle from a background. The perception loss can beused for generating precise object bounding boxes. For each predefinedanchor box, the network can output a classification score as well asseveral regression targets. A classification score p_(i,j) ^(k) can bedefined to indicate the probability of existence of a vehicle or otherobject at a particular anchor. A cross entropy loss can be employed forthe classification. In example embodiments, the cross entropy loss canbe defined as set forth in Equation 6.

_(cla)=Σ_(i,j,k)(q _(i,j) ^(k) log p _(i,j) ^(k)+(1−q _(i,j)^(k))log(1−p _(i,j) ^(k)))   Equation 6

In Equation 6, q_(i,j) ^(k) can represent the class label for an anchor.For example, q_(i,j) ^(k) can be set equal to “1” to represent a vehicleat the anchor, or can be set equal to “0” to represent background at theanchor. The regression outputs can include information such as position,shape, and heading angle at each timeframe t, as set forth in Equations7, 8, and 9.

$\begin{matrix}{l_{\chi} = {{\frac{x^{a} - x^{l}}{w^{l}}l_{y}} = \frac{y^{a} - y^{l}}{h^{l}}}} & {{Equation}7}\end{matrix}$ $\begin{matrix}{s_{w} = {{\log\frac{w^{a}}{w^{l}}s_{h}} = {\log\frac{h^{a}}{h^{l}}}}} & {{Equation}8}\end{matrix}$ $\begin{matrix}{a_{\sin} = {{{\sin\left( {\theta^{a} - \theta^{l}} \right)}a_{\cos}} = {\cos\left( {\theta^{a} - \theta^{l}} \right)}}} & {{Equation}9}\end{matrix}$

In Equations 7, 8, and 9, superscript a can represent an anchor and lcan represent a label. A weighted smooth L1 loss over all the inputs canbe used. The overall perception loss can be defined as set forth inEquation 10.

_(perception)=Σ(

_(cla)+αΣ_(t=0) ^(T)

_(reg) ^(t))   Equation 10

The regression model can be summed over all vehicle correlated anchors,from a current timeframe to a predicted horizon T. In this manner, themodel can be taught or trained to predict the position of vehicles inevery timeframe.

To find a training label for each anchor, the anchor can be associatedwith its neighboring ground truth bounding box. For example, all theground truth boxes with an intersection of union (IoU) higher than 0.4for each anchor can be used. The highest among the ground truth boxescan be associated with the anchor, and the class label and regressiontargets can be computed accordingly. In some aspects, any non-assignedground truth boxes can be associated with their nearest neighbor. Theremaining anchors can be treated as background, and may not beconsidered in the regression loss in some examples. It is noted that oneground truth box may associate to multiple anchors, but that one anchoronly is associated with one ground truth box. During training, hardnegative mining can be applied to overcome imbalance between positiveand negative samples in some examples.

At (654), a planning loss component can be defined. Training areasonable cost volume can be challenging in the absence of a groundtruth cost volume. To overcome such difficulties, a max-margin loss canbe utilized to define the planning loss component. The max-margin losscan be minimized, where the ground truth trajectory is used as apositive training example, and randomly sampled trajectories are used asnegative training examples. Such training encourages the ground truthtrajectory to have the minimal cost, and other trajectories to havehigher costs. More specifically, assume a ground truth trajectory{(x^(t),y^(t))} for the next T time steps, where (x^(t),y^(t)) is theposition of the autonomous vehicle at the t time step. The cost volumevalue can be defined as c_(i) ^(t) at the point (x^(t),y^(t)). A numberof negative trajectories equal to N can be sampled. The ith negativetrajectory can be defined as {(x_(i) ^(t),y_(i) ^(t))} and the costvolume value at these points can be defined as c_(i) ^(t). The overallmax margin loss can be defined as set forth in Equation 11.

$\begin{matrix}{\mathcal{L}_{costmap} = {{\sum}_{\{{({x^{t_{i}}y^{t}})}\}}\left( {\max\limits_{1 \leq i \leq N}\left( {{\sum}_{t = 1}^{T}\left\lbrack {{\overset{\hat{}}{c}}^{t} - c_{i}^{t} + d_{i}^{t} + \gamma_{i}^{t}} \right\rbrack}_{+} \right)} \right)}} & {{Equation}11}\end{matrix}$

In Equation 11, the innermost summation denotes the discrepancy betweenthe ground truth trajectory and one negative trajectory sample, which isa sum of per-timestep loss. The notation [ ]₊ represents an ReLUfunction. This is designed to be inside the summation rather thanoutside, as it can prevent the cost volume at one time step fromdominating the whole loss. The distance between the negative trajectoryand the ground truth trajectory can be defined as d_(i) ^(t). The groundtruth trajectory can be defined as ∥(x^(t),y^(t))−(x_(i) ^(t),y_(i)^(t))∥₂ and can be used to encourage negative trajectories far from theground truth trajectory to have much higher cost. A traffic ruleviolation cost can be defined as γ_(i) ^(t). The traffic violation costin some examples can be a constant only if the negative trajectory tviolates traffic rules at time t (e.g., moving before red lights,colliding with other vehicles, etc.). This can be used to determine how‘bad’ the negative samples are. As a result, it can penalize thoserule-violated trajectories more severely to train the machine-learnedmotion planning model to avoid dangerous behaviors. After computing thediscrepancy between the ground truth trajectory and each negativesample, the worst case can be optimized by the max operation. Optimizingthe worst case can encourage the model to learn the cost volume thatdiscriminates good trajectories from bad trajectories.

At (658), training data can be provided to the machine learned motionplanning model. The training data can include a number of sets of groundtruth data. For example, to train a machine-learned model, a trainingdata set can include a large number of previously obtainedrepresentations of input data, as well as corresponding labels thatdescribe corresponding outputs associated with the corresponding inputdata. A training data set can more particularly include a first portionof data corresponding to one or more representations of input data. Theinput data can, for example, be recorded or otherwise determined while avehicle is in navigational operation and/or the like. The trainingdataset can further include a second portion of data corresponding tolabels identifying outputs. The labels included within the secondportion of data within the training dataset can be manually annotated,automatically annotated, or annotated using a combination of automaticlabeling and manual labeling.

At (658), the computing system can input a first portion of a set ofground-truth data into the machine-learned motion planning model. Forexample, to train the model, a training computing system can input afirst portion of a set of ground-truth data (e.g., the first portion ofthe training dataset) into the machine-learned model to be trained.

At (660), the computing system can receive as output of themachine-learned model, in response to receipt of the ground-truth data,one or more inferences that predict a second portion of the set ofground-truth data. For example, in response to receipt of a firstportion of a set of ground-truth data, the machine-learned model canoutput a target trajectory. This output of the machine-learned model canpredict the remainder of the set of ground-truth data (e.g., the secondportion of the training dataset).

At (662), one or more discrepancies between the ground truth trajectoryand negative training examples are detected. At (664), the computingsystem can determine a loss function that compares the predictedinferences generated by the machine-learned model to the second portionof the set of ground-truth data. For example, after receiving suchpredictions, a training computing system can apply or otherwisedetermine a loss function that compares the inferences output by themachine-learned model to the remainder of the ground-truth data (e.g.,ground-truth labels) which the model attempted to predict. In exampleembodiments, the loss function can be a total loss that includes aplanning loss component and a perception loss component.

At (666), the loss function can be back-propagated to jointly train themodel to learn a cost volume that discriminates trajectories, as well asto generate intermediate representations that optimize motion planning.At (668), one or more portions of the machine-learned motion planningmodel can be modified based on the backpropagation at (666). Forexample, the machine learned motion planning model can be trained bymodifying one or more weights associated with the model. This process ofinputting ground-truth data, determining a loss, and backpropagating theloss through the model can be repeated numerous times as part oftraining the model. For example, the process can be repeated for each ofnumerous sets of ground-truth data provided within the training dataset.

Various means can be configured to perform the methods and processesdescribed herein. FIG. 9 depicts an example of a computing environmentincluding example means for performing the methods and processesdescribed herein. FIG. 9 depicts an example motion planning computingsystem 702 with units 704-720 for performing operations and functionsaccording to example embodiments of the present disclosure. For example,motion planning computing system 900 can include one or more sensor dataunit(s) 704, one or more map data unit(s) 706, one or more cost volumeunit(s) 708, one or more intermediate representation unit(s) 710, one ormore trajectory sampling unit(s) 712, one or more trajectory scoringunit(s) 714, one or more trajectory selecting unit (S) 716, one or moremotion planning unit(s) 718, one or more vehicle controlling unit(s)720, and/or other means for performing the operations and functionsdescribed herein. In some implementations, one or more of the units704-720 may be implemented separately. In some implementations, one ormore of the units 704-720 may be a part of or included in one or moreother units. These means can include processor(s), microprocessor(s),graphics processing unit(s), logic circuit(s), dedicated circuit(s),application-specific integrated circuit(s), programmable array logic,field-programmable gate array(s), controller(s), microcontroller(s),and/or other suitable hardware. The means can also, or alternately,include software control means implemented with a processor or logiccircuitry for example. The means can include or otherwise be able toaccess memory such as, for example, one or more non-transitorycomputer-readable storage media, such as random-access memory, read-onlymemory, electrically erasable programmable read-only memory, erasableprogrammable read-only memory, flash/other memory device(s), dataregistrar(s), database(s), and/or other suitable hardware.

The means can be programmed to perform one or more algorithm(s) forcarrying out the operations and functions described herein. Forinstance, the means can be configured to obtain sensor data such asLIDAR point cloud data associated with an environment external to anautonomous vehicle. The means can be configured to project the LIDARpoint cloud data to a bird's eye view representation of the LIDAR pointcloud data in some examples. A sensor data unit 704 is one example of ameans for obtaining sensor data such as LIDAR point cloud data asdescribed herein.

The means can be configured to obtain map data such as high-definitionmap data associated with an environment external to an autonomousvehicle. The high-definition map data may include information about thesemantics of the environment such as the location of lanes, laneboundaries shape, location of signs, etc. Such semantic information canenable accurate motion planning including driving according to trafficrules. A map data unit 706 is one example of a means for obtaining mapdata such as high-definition map data as described herein. In someexamples, sensor data unit 704 and/or map data unit 706 may includemeans for obtaining the sensor data and the map data and inputting thesensor data and the map data into a machine learned motion planningmodel.

The means can be configured to generate a cost volume including dataindicative of a cost associated with each of the plurality of futurelocations of the autonomous vehicle within a planning horizon. A costvolume unit 708 is one example of a means for generating a targettrajectory. In some examples, a cost volume unit 708 can be part of abackbone network including a cost volume header that is configured togenerate the cost volume based at least in part on sensor data and mapdata. In some examples, the cost volume header can include a set of oneor more convolutional network layers. In some examples, the one or moreconvolutional network layers of the cost volume header can be optimizedbased on a selected target trajectory.

The means can be configured to generate one or more intermediaterepresentations associated with at least one of an object detection oran object prediction. An intermediate representation unit 710 is oneexample of a means for generating one or more intermediaterepresentations associated with at least one of an object detection oran object prediction. In some examples, an intermediate representationunit 710 can be part of a backbone network including a perception headerconfigured to generate the one or more intermediate representationsbased at least in part on sensor data and map data. In some examples,the one or more intermediate representations can include one or morebounding boxes associated with an object detection and one or moremotion predictions associated with the object detection. In someexamples, the perception header can include a set of one or moreconvolutional network layers. In some examples, the one or moreconvolutional network layers of the perception header can be optimizedbased on an output of the one or more convolutional network layers ofthe cost volume header. In some examples, the one or more convolutionalnetwork layers of the perception header can be optimized based on aselected target trajectory. In some examples, intermediaterepresentation unit 710 may include a backbone network configured aspart of the machine learned motion planning model.

The means can be configured to determine a set of potential trajectoriesfrom a plurality of possible trajectories for the autonomous vehicle. Atrajectory sampling unit 712 is one example of a means for determining aset of potential trajectories for the autonomous vehicle. In someexamples, the trajectory sampling unit 712 may include means forapplying a dynamical model to generate the set of potential trajectoriesaccording to at least one of a speed constraint, an accelerationconstraint, or a turning angle constraint. In some examples, trajectorysampling unit 712 may include a trajectory generator configured as partof a machine learned motion planning model.

The means can be configured to evaluate a set of potential trajectoriesfor the autonomous vehicle and to generate a trajectory score for atleast one potential trajectory based at least in part on a cost volume.A trajectory scoring unit 714 is one example of a means for evaluating aset of potential trajectories for the autonomous vehicle and generatinga trajectory score for at least one potential trajectory based at leastin part on a cost volume. In some examples, trajectory scoring unit 714can include means that are configured to generate, using a trajectorygenerator of a machine-learned motion planning model, a respective costfor each of a plurality of potential trajectories. The respective costsfor each of the plurality of potential trajectories can be generatedbased at least in part on the cost volume associated with such potentialtrajectory.

The means can be configured to select a target trajectory for theautonomous vehicle from one or more potential trajectories based atleast in part on the trajectory score for at least one potentialtrajectory. A trajectory selection unit 716 is one example of a meansfor selecting a target trajectory for the autonomous vehicle. In someexamples, trajectory selection unit 716 can select a target trajectoryfrom a set of potential trajectories based at least in part on thetrajectory score for at least one potential trajectory. In someexamples, a trajectory selection unit 716 can include means forselecting a target trajectory for an autonomous vehicle from a set ofpotential trajectories based at least in part on the trajectory scorefor each of the set of potential trajectories. In some examples, atrajectory selection unit 716 can include means for selecting a targettrajectory based on the respective cost for each of a plurality ofpotential trajectories. In some examples, trajectory selection unit 716can include means for selecting a target trajectory based on optimizingat least one potential trajectory for the autonomous vehicle based atleast in part on the cost volume generated by the backbone network.

The means can be configured to generate one or more motion plans basedat least in part on a selected target trajectory. A motion planning unit718 is one example of a means for generating one or more motion plansbased at least in part on the selected target trajectory. The means canbe configured to determine a motion plan for the autonomous vehicle thatbest navigates the autonomous vehicle along a determined travel routerelative to the objects at such locations. In some examples, a motionplanning unit 718 can include means for receiving a target trajectoryfor an autonomous vehicle as an output of a machine learned motionplanning model.

The means can be configured to control one or more vehicle controls(e.g., actuators or other devices that control gas flow, steering,braking, etc.) to execute the selected motion plan. A vehiclecontrolling unit 720 is one example of a means for controlling motion ofthe autonomous vehicle to execute the motion plan. In some examples, avehicle controlling unit 720 can include means for generating one ormore vehicle control signals for the autonomous vehicle based at leastin part on the target trajectory.

The means can be configured to train a machine learned motion planningmodel based at least in part on multitask training with supervision forperception and motion planning. A model training unit 722 is one exampleof a means for training a machine-learned motion planning model based atleast in part on multitask training with supervision for perception ofmotion planning. In some examples, the model training unit 722 caninclude means for training a machine-learned motion planning model usinga total loss function that includes a perception loss component and amotion planning loss component. In some examples, the perception losscomponent can include a classification loss associated withdistinguishing a vehicle from a background. In some examples, theperception loss component can include a regression loss associated withgenerating object bounding boxes. In some examples, the motion planningloss component is generated based at least in part on one or more humandriven trajectories. In some examples, the machine learned motionplanning model is jointly trained for motion planning and generating theintermediate representations based on motion planning optimization.

FIG. 10 depicts a block diagram of an example computing system 1000according to example embodiments of the present disclosure. The examplecomputing system 1000 includes a computing system 1002 and a machinelearning computing system 1030 that are communicatively coupled over anetwork 1080.

In some implementations, the computing system 1002 can perform variousoperations as part of motion planning for an autonomous vehicle. Forexample, computing system 1002 can receive sensor data map dataassociated with an environment external to an autonomous vehicle, andprocess the sensor data and the map data to generate a target trajectoryfor the autonomous vehicle, as part of autonomous vehicle operations. Insome implementations, the computing system 1002 can be included in anautonomous vehicle. For example, the computing system 1002 can beon-board the autonomous vehicle. In some embodiments, computing system1002 can be used to implement vehicle computing system 110. In otherimplementations, the computing system 1002 is not located on-board theautonomous vehicle. For example, the computing system 1002 can operateoffline to obtain sensor data and perform target trajectory generation.The computing system 1002 can include one or more distinct physicalcomputing devices.

The computing system 1002 includes one or more processors 1012 and amemory 1014. The one or more processors 1012 can be any suitableprocessing device (e.g., a processor core, a microprocessor, an ASIC, aFPGA, a controller, a microcontroller, etc.) and can be one processor ora plurality of processors that are operatively connected. The memory1014 can include one or more non-transitory computer-readable storagemedia, such as RAM, ROM, EEPROM, EPROM, one or more memory devices,flash memory devices, etc., and combinations thereof.

The memory 1014 can store information that can be accessed by the one ormore processors 1012. For instance, the memory 1014 (e.g., one or morenon-transitory computer-readable storage mediums, memory devices) canstore data 1016 that can be obtained, received, accessed, written,manipulated, created, and/or stored. The data 1016 can include, forinstance, map data, image or other sensor data captured by one or moresensors, machine-learned models, etc. as described herein. In someimplementations, the computing system 1002 can obtain data from one ormore memory device(s) that are remote from the computing system 1002.

The memory 1014 can also store computer-readable instructions 1018 thatcan be executed by the one or more processors 1012. The instructions1018 can be software written in any suitable programming language or canbe implemented in hardware. Additionally, or alternatively, theinstructions 1018 can be executed in logically and/or virtually separatethreads on processor(s) 1012.

For example, the memory 1014 can store instructions 1018 that whenexecuted by the one or more processors 1012 cause the one or moreprocessors 1012 to perform any of the operations and/or functionsdescribed herein, including, for example, generating motion plansincluding target trajectories for an autonomous vehicle, etc.

According to an aspect of the present disclosure, the computing system1002 can store or include one or more machine-learned models 1010. Asexamples, the machine-learned models 1010 can be or can otherwiseinclude various machine-learned models such as, for example, neuralnetworks (e.g., deep neural networks or other types of models includinglinear models and/or non-linear models. Example neural networks includefeed-forward neural networks, recurrent neural networks (e.g., longshort-term memory recurrent neural networks), convolutional neuralnetworks, or other forms of neural networks.

In some implementations, the computing system 1002 can receive the oneor more machine-learned models 1010 from the machine learning computingsystem 1030 over network 1080 and can store the one or moremachine-learned models 1010 in the memory 1014. The computing system1002 can then use or otherwise implement the one or more machine-learnedmodels 1010 (e.g., by processor(s) 1012). In particular, the computingsystem 1002 can implement the machine-learned model(s) 1010 to generateuncertainty data for object detections, predictions, and motion plangeneration based on sensor data.

The machine learning computing system 1030 includes one or moreprocessors 1032 and a memory 1034. The one or more processors 1032 canbe any suitable processing device (e.g., a processor core, amicroprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.)and can be one processor or a plurality of processors that areoperatively connected. The memory 1034 can include one or morenon-transitory computer-readable storage media, such as RAM, ROM,EEPROM, EPROM, one or more memory devices, flash memory devices, etc.,and combinations thereof. In some embodiments, machine learningcomputing system 1030 can be used to implement vehicle computing system110.

The memory 1034 can store information that can be accessed by the one ormore processors 1032. For instance, the memory 1034 (e.g., one or morenon-transitory computer-readable storage mediums, memory devices) canstore data 1036 that can be obtained, received, accessed, written,manipulated, created, and/or stored. The data 1036 can include, forinstance, machine-learned models, sensor data, and map data as describedherein. In some implementations, the machine learning computing system1030 can obtain data from one or more memory device(s) that are remotefrom the machine learning computing system 1030.

The memory 1034 can also store computer-readable instructions 1038 thatcan be executed by the one or more processors 1032. The instructions1038 can be software written in any suitable programming language or canbe implemented in hardware. Additionally, or alternatively, theinstructions 1038 can be executed in logically and/or virtually separatethreads on processor(s) 1032.

For example, the memory 1034 can store instructions 1038 that whenexecuted by the one or more processors 1032 cause the one or moreprocessors 1032 to perform any of the operations and/or functionsdescribed herein, including, for example, generating motion plansincluding target trajectories for an autonomous vehicle, and controllingan autonomous vehicle based on the target trajectories.

In some implementations, the machine learning computing system 1030includes one or more server computing devices. If the machine learningcomputing system 1030 includes multiple server computing devices, suchserver computing devices can operate according to various computingarchitectures, including, for example, sequential computingarchitectures, parallel computing architectures, or some combinationthereof.

In addition or alternatively to the machine-learned model(s) 1010 at thecomputing system 1002, the machine learning computing system 1030 caninclude one or more machine-learned models 1040. As examples, themachine-learned models 1040 can be or can otherwise include variousmachine-learned models such as, for example, neural networks (e.g., deepneural networks) or other types of models including linear models and/ornon-linear models. Example neural networks include feed-forward neuralnetworks, recurrent neural networks (e.g., long short-term memoryrecurrent neural networks), convolutional neural networks, or otherforms of neural networks.

As an example, the machine learning computing system 1030 cancommunicate with the computing system 1002 according to a client-serverrelationship. For example, the machine learning computing system 1030can implement the machine-learned models 1040 to provide a web serviceto the computing system 1002. For example, the web service can generatemotion plans including target trajectories in response to sensor dataand/or other data received from an autonomous vehicle.

Thus, machine-learned models 1010 can located and used at the computingsystem 1002 and/or machine-learned models 1040 can be located and usedat the machine learning computing system 1030.

In some implementations, the machine learning computing system 1030and/or the computing system 1002 can train the machine-learned models1010 and/or 1040 through use of a model trainer 1060. The model trainer1060 can train the machine-learned models 1010 and/or 1040 using one ormore training or learning algorithms. One example training technique isbackwards propagation of errors. In some implementations, the modeltrainer 1060 can perform supervised training techniques using a set oflabeled training data. In other implementations, the model trainer 1060can perform unsupervised training techniques using a set of unlabeledtraining data. The model trainer 1060 can perform a number ofgeneralization techniques to improve the generalization capability ofthe models being trained. Generalization techniques include weightdecays, dropouts, or other techniques.

In particular, the model trainer 1060 can train a machine-learned model1010 and/or 1040 based on a set of training data 1062. The training data1062 can include, for example, ground truth data including annotationsfor sensor data portions and/or vehicle state data. The model trainer1060 can be implemented in hardware, firmware, and/or softwarecontrolling one or more processors.

In some examples, the model trainer 1060 can train a machine-learnedmodel 1010 and/or 1040 configured to generate motion plans includingtarget trajectories as well as intermediate representations associatedwith one or more of an object detection or an object prediction. In someexamples, the machine-learned model 1010 and/or 1040 is trained usingsensor data that has been labeled or otherwise annotated as having acorrespondence to a detected object, a class of a detected object, etc.By way of example, sensor data collected in association with aparticular class of object can be labeled to indicate that itcorresponds to an object detection or the particular class. In someinstances, the label may be a simple annotation that the sensor datacorresponds to a positive training dataset.

The computing system 1002 can also include a network interface 1024 usedto communicate with one or more systems or devices, including systems ordevices that are remotely located from the computing system 1002. Thenetwork interface 1024 can include any circuits, components, software,etc. for communicating with one or more networks (e.g., 1080). In someimplementations, the network interface 1024 can include, for example,one or more of a communications controller, receiver, transceiver,transmitter, port, conductors, software and/or hardware forcommunicating data. Similarly, the machine learning computing system1030 can include a network interface 1064.

The network(s) 1080 can be any type of network or combination ofnetworks that allows for communication between devices. In someembodiments, the network(s) can include one or more of a local areanetwork, wide area network, the Internet, secure network, cellularnetwork, mesh network, peer-to-peer communication link and/or somecombination thereof and can include any number of wired or wirelesslinks. Communication over the network(s) 1080 can be accomplished, forinstance, via a network interface using any type of protocol, protectionscheme, encoding, format, packaging, etc.

FIG. 10 illustrates one example computing system 1000 that can be usedto implement the present disclosure. Other computing systems can be usedas well. For example, in some implementations, the computing system 1002can include the model trainer 1060 and the training data 1062. In suchimplementations, the machine-learned models 1010 can be both trained andused locally at the computing system 1002. As another example, in someimplementations, the computing system 1002 is not connected to othercomputing systems.

In addition, components illustrated and/or discussed as being includedin one of the computing systems 1002 or 1030 can instead be included inanother of the computing systems 1002 or 1030. Such configurations canbe implemented without deviating from the scope of the presentdisclosure. The use of computer-based systems allows for a great varietyof possible configurations, combinations, and divisions of tasks andfunctionality between and among components. Computer-implementedoperations can be performed on a single component or across multiplecomponents. Computer-implemented tasks and/or operations can beperformed sequentially or in parallel. Data and instructions can bestored in a single memory device or across multiple memory devices.

Computing tasks discussed herein as being performed at computingdevice(s) remote from the autonomous vehicle can instead be performed atthe autonomous vehicle (e.g., via the vehicle computing system), or viceversa. Such configurations can be implemented without deviating from thescope of the present disclosure. The use of computer-based systemsallows for a great variety of possible configurations, combinations, anddivisions of tasks and functionality between and among components.Computer-implemented operations can be performed on a single componentor across multiple components. Computer-implements tasks and/oroperations can be performed sequentially or in parallel. Data andinstructions can be stored in a single memory device or across multiplememory devices.

While the present subject matter has been described in detail withrespect to specific example embodiments thereof, it will be appreciatedthat those skilled in the art, upon attaining an understanding of theforegoing may readily produce alterations to, variations of, andequivalents to such embodiments. Accordingly, the scope of the presentdisclosure is by way of example rather than by way of limitation, andthe subject disclosure does not preclude inclusion of suchmodifications, variations and/or additions to the present subject matteras would be readily apparent to one of ordinary skill in the art.

1-20. (canceled)
 21. A computer-implemented method, comprising:obtaining sensor data associated with an environment surrounding anautonomous vehicle; generating a plurality of trajectory proposals forthe autonomous vehicle, the plurality of trajectory proposalsrespectively corresponding to a plurality of potential paths of theautonomous vehicle through the environment; determining a respectivescore for a respective trajectory proposal of the plurality oftrajectory proposals by: obtaining one or more cost values associatedwith a respective potential path corresponding to the respectivetrajectory proposal, wherein: the one or more cost values are obtainedfrom cost data descriptive of costs for a plurality of positions in theenvironment along the respective potential path, and the cost data isgenerated based at least in part on the sensor data by a machine-learnedcost model component; selecting a target trajectory from the one or moretrajectory proposals based at least in part on the respective scores forthe one or more trajectory proposals; and controlling a motion of theautonomous vehicle based at least in part on the selected targettrajectory.
 22. The computer-implemented method of claim 21, wherein:the cost data describes costs for possible positions that the autonomousvehicle can take within a planning horizon; the respective potentialpath describes a plurality of proposed positions that the autonomousvehicle can take at a plurality of timesteps within the planninghorizon; and the method comprises: obtaining, from the cost data, thecost values associated with the proposed positions at the plurality oftimesteps.
 23. The computer-implemented method of claim 22, wherein thecost data is generated by extracting features from the sensor data andprocessing the features with the machine-learned cost model component.24. The computer-implemented method of claim 21, comprising: generating,using a backbone machine-learned model component, feature data based atleast in part on the sensor data; processing, using a machine-learnedforecasting model component, the feature data to generate forecastingdata indicating one or more forecasted positions of an object in theenvironment; and processing, using the machine-learned cost modelcomponent, the feature data to generate the cost data.
 25. Thecomputer-implemented method of claim 24, wherein the feature data isbased at least in part on map data processed by the backbonemachine-learned model component.
 26. The computer-implemented method ofclaim 22, wherein the cost data comprises a cost volume havingdimensions associated with a region of interest that comprises theplurality of positions in the environment.
 27. The computer-implementedmethod of claim 26, wherein the cost volume comprises a temporaldimension comprising the plurality of timesteps.
 28. Thecomputer-implemented method of claim 21, wherein generating the one ormore trajectory proposals for the autonomous vehicle comprises: samplinga set of physically possible trajectories for the autonomous vehicle.29. The computer-implemented method of claim 28, wherein the samplingcomprises: sampling a shape of a curve; sampling a motion parametercomprising at least one of: a velocity parameter or an accelerationparameter; and combining the sampled shape and the sampled motionparameter to obtain a respective possible trajectory.
 30. An autonomousvehicle computing system for controlling an autonomous vehicle, theautonomous vehicle computing system comprising: one or more processors;and one or more non-transitory computer-readable media that storeinstructions that are executable by the one or more processors to causethe autonomous vehicle computing system to perform operations, theoperations comprising: obtaining sensor data associated with anenvironment surrounding an autonomous vehicle; generating a plurality oftrajectory proposals for the autonomous vehicle, the plurality oftrajectory proposals respectively corresponding to a plurality ofpotential paths of the autonomous vehicle through the environment;determining a respective score for a respective trajectory proposal ofthe plurality of trajectory proposals by: obtaining one or more costvalues associated with a respective potential path corresponding to therespective trajectory proposal, wherein: the one or more cost values areobtained from cost data descriptive of costs for a plurality ofpositions in the environment along the respective potential path, andthe cost data is generated based at least in part on the sensor data bya machine-learned cost model component; selecting a target trajectoryfrom the one or more trajectory proposals based at least in part on therespective scores for the one or more trajectory proposals; andcontrolling a motion of the autonomous vehicle based at least in part onthe selected target trajectory.
 31. The autonomous vehicle of claim 30,wherein: the cost data describes costs for possible positions that theautonomous vehicle can take within a planning horizon; the respectivepotential path describes a plurality of proposed positions that theautonomous vehicle can take at a plurality of timesteps within theplanning horizon; and the operations comprise: obtaining, from the costdata, the cost values associated with the proposed positions at theplurality of timesteps.
 32. The autonomous vehicle of claim 31, whereinthe cost data is generated by extracting features from the sensor dataand processing the features with the machine-learned cost modelcomponent.
 33. The autonomous vehicle of claim 30, wherein theoperations comprise: generating, using a backbone machine-learned modelcomponent, feature data based at least in part on the sensor data;processing, using a machine-learned forecasting model component, thefeature data to generate forecasting data indicating one or moreforecasted positions of an object in the environment; and processing,using the machine-learned cost model component, the feature data togenerate the cost data.
 34. The autonomous vehicle of claim 33, whereinthe feature data is based at least in part on map data processed by thebackbone machine-learned model component.
 35. The autonomous vehicle ofclaim 31, wherein the cost data comprises a cost volume havingdimensions associated with a region of interest that comprises theplurality of positions in the environment.
 36. The autonomous vehicle ofclaim 35, wherein the cost volume comprises a temporal dimensioncomprising the plurality of timesteps.
 37. The autonomous vehicle ofclaim 30, wherein generating the one or more trajectory proposals forthe autonomous vehicle comprises: sampling a set of physically possibletrajectories for the autonomous vehicle.
 38. The autonomous vehicle ofclaim 37, wherein the sampling comprises: sampling a shape of a curve;sampling a motion parameter comprising at least one of: a velocityparameter or an acceleration parameter; and combining the sampled shapeand the sampled motion parameter to obtain a respective possibletrajectory.
 39. One or more non-transitory computer-readable media thatstore instructions that are executable by one or more processors tocause a computing system to perform operations, the operationscomprising: obtaining sensor data associated with an environmentsurrounding an autonomous vehicle; generating a plurality of trajectoryproposals for the autonomous vehicle, the plurality of trajectoryproposals respectively corresponding to a plurality of potential pathsof the autonomous vehicle through the environment; determining arespective score for a respective trajectory proposal of the pluralityof trajectory proposals by: obtaining one or more cost values associatedwith a respective potential path corresponding to the respectivetrajectory proposal, wherein: the one or more cost values are obtainedfrom cost data descriptive of costs for a plurality of positions in theenvironment along the respective potential path, and the cost data isgenerated based at least in part on the sensor data by a machine-learnedcost model component; selecting a target trajectory from the one or moretrajectory proposals based at least in part on the respective scores forthe one or more trajectory proposals; and controlling a motion of theautonomous vehicle based at least in part on the selected targettrajectory.
 40. The one or more non-transitory computer-readable mediaof claim 39, wherein: the cost data describes costs for possiblepositions that the autonomous vehicle can take within a planninghorizon; the respective potential path describes a plurality of proposedpositions that the autonomous vehicle can take at a plurality oftimesteps within the planning horizon; and the operations comprise:obtaining, from the cost data, the cost values associated with theproposed positions at the plurality of timesteps.